Constraints on workflow paths, part 2

Continuing the effort to express constraints on workflow paths from the previous post, here is some practical advice for those who wish to undertake something similar, or simply retrace the same road.

First of all, I settled on SPOT. This was done for the following reasons: its on-line LTL-to-TGBA translator, as they call it, makes it easy to experiment; it can help you rephrase your LTL using automatic simplifications; it can output many different visual formats; its standalone executable can output standard LBTT; and, what I found quite helpful, it actually understands PSL in addition to LTL. PSL is a language “for specifying properties or assertions about hardware designs”. PSL supports SEREs (Sequential Extended Regular Expressions), which are formed like regular expressions, and can be combined with LTL expressions using special binding operators. Even though the whole expression can ultimately be expressed fully in plain LTL (and SPOT can formulate it that way for you), I found it easier to start with a SERE.

Which brings me to the formulation I arrived at for the property that troubled me in the end of the previous post: G(soi -> X(!input U (input & Fcoi))) (which SPOT can also output in fancy Unicode: □(○(¬input U (input∧◇coi))∨¬soi)). Its Büchi automaton is the following:

ba

And here it is on the Web. Looks daunting, but you get used to it after a while. Let me show you how to produce it from the command line (Cygwin shell, in my case).

ltl2tgba -f 'G(soi -> X(!input U (input & Fcoi)))' --lbtt --ba

The LBTT output is the following.

4 2t
0 1
0 0 1 -1 ! soi
1 0 1 -1 soi
-1
1 0
0 0 1 -1 & & ! soi input coi
1 1 -1 ! input
1 0 1 -1 & & soi input coi
2 0 -1 & & soi input ! coi
3 0 -1 & & ! soi input ! coi
-1
2 0
0 0 1 -1 & & ! soi input coi
1 1 -1 & ! input coi
1 0 1 -1 & & soi input coi
2 -1 & ! input ! coi
2 0 -1 & & soi input ! coi
3 0 -1 & & ! soi input ! coi
-1
3 0
0 0 1 -1 & ! soi coi
1 0 1 -1 & soi coi
2 0 -1 & soi ! coi
3 0 -1 & ! soi ! coi
-1

To link the two, I have prepared an image where I link the first few transitions in the LBTT output to those on the automaton image.

lbtt_ba

You can take it from there…

The point of this exercise is the following: If you express your property in LTL and you’re prepared to jump through a few hoops, it is possible to invoke SPOT and read back a Büchi automaton for it, which you can use either to check your model, or as a run-time constraint checker. Now, there are ways to interface to SPOT so that you can provide your state space, as they call it, and SPOT will verify if the intersection of it with the automaton is empty or not. I have not explored it further. But using the resulting automaton as a generator for constraint-checking run-time infrastructure seems like what I wanted to do in Pandamator ever since I read about LTL.

I’m happy to close this post with a positive feeling! It’s not the end of the story, I promise you…

Advertisements

Constraints on workflow paths

I have mixed feelings about the current post. For the specific problem at hand, I had intended to develop a series of solutions, the one represented here being just the first one, soon to be replaced by a better one. However, for reasons I am going to go into later, I settled for this one, for the time being.

The domain of discourse is: workflow specifications, and how to verify them. Wide and under-specified enough to make for an interesting problem. My first thought for a quick-and-dirty solution, which would give some immediate results until the more elaborate ones brewed, was to develop a program to evaluate some pertinent constraints on possible paths through the workflow, and my choice for the language to use was Prolog. The reason is the painless facilities it has for state-space exploration: backtracking, incremental elaboration of data and pattern-matching through unification and conciseness. Plus, back in the ol’ days, I considered myself a decent Prolog programmer. Now, it could be debated whether tag “Esoteric” is really suitable for Prolog . After all, it has been around for more than forty years, and most universities teach it. However, it remains impenetrable for the uninitiated, and I don’t think many people actually learn it, even if they are taught about it.

For reasons unknown I had to start with Visual Prolog. Probably because I had avoided Turbo Prolog when it came out and I never had more than a casual glance at Mercury, Oz and other multi-paradigm languages.  That phase lasted the better part of two days, then I got fed up with the nagging message at startup and I went back to the less flashy SWI Prolog.To make a long story short, what I did was the following. My program loops through all of our workflows, reads them as Prolog facts and then runs the actual verification code on each one. The focus of this post is the actual verification code running on the graphs which encode the workflows.

Each verification rule looks for counter-examples of the property it encodes, which are paths through the graph. It consists of the following components: a selector of initial nodes, a filter that limits the edges taken from each node, and a selector of final nodes. Other than limiting the search space with the aid for those components, exploration is exhaustive, since we’re looking for counterexamples.

For example, the following clauses of the filter predicate for a particular rule, encode some workflow conditions on having or not having passed through a prior workflow step (and the lack of syntax highlighting for Prolog from WordPress certainly justifies the “Esoteric” tag).

allowed_arc(N,V) :-
	stepcond(N,through(I)),
	member(I,V),
	!.
allowed_arc(N,V) :-
	stepcond(N,not_through(I)),
	member(I,V),
	!, fail.

As you can see, the filtering predicate takes as parameters the node to go to, and the sequence of previously visited nodes. The filtering predicated is called by predicate possible_path, which computes the path.

possible_next(V, Pr, X, Y) :-
	stepact(X,goto(Y)),
	apply(Pr,[Y,V]).
possible_next(V, Pr, X, Y) :-
	stepcond(Y,input(_)),
	X \= Y,
	apply(Pr,[Y,V]).

possible_path(V,_,X,X,P) :-
	reverse(V,P).
possible_path(V, Pr, X, Z, P) :-
	possible_next(V,Pr,X,Y),
	possible_path([Y|V],Pr,Y,Z,P).

Evaluating the rule is done with the help of findall.

findall(
	P,
	(stepact(F,soi),
		possible_path([F],allowed_arc,F,Z,P),
		length(P,L),
		L > 1,
		stepact(Z,soi)),
	X4)

It needs a second look, so let me break it up.

stepact(F,soi) Start from every action matching the particular fact pattern.

possible_path([F],allowed_arc,F,Z,P) Enumerate all possible paths using the particular filter.

length(P,L), L > 1, stepact(Z,soi) Stop for all paths of length > 1 whose final node matches the particular fact pattern.

A brute-force solution, but managed to sniff out several potential problems in the workflows and has room to grow with new rules. Why did I say that its only purpose was to pave the way for more elaborate solutions?

My hope was to use formal verification methods, and I had every reason to be hopeful because I had multiple candidates to try.

The first one was UPPAAL, a tool which allows one to specify timed automata, simulate them and verify invariants expressed in CTL. Sounds cool, doesn’t it?

uppaal

The problem is, encoding your system in this way is not an easy task, and probably not something that could be done by translating a workflow specification automatically. And people who used it independently had trouble adapting the tool to their problem and ended up actually playing to the strengths of the tool. It’s not out the question, but it needs a lot of time and dedication, and it is not a short-term study.

My other hope was to express the path constraints using LTL, instead of the unstructured way I was using. Which necessitated a lot of research to find how LTL can be verified and what tools there are to aid one in this endeavor. It turns out there are, and some of them have been also put online for people to test! Try out LTL2BA and SPOT. And, if you persevere through various tortures imposed on you by the intricacies of cross-platform C/C++ builds, you can actually find yourself with real singing and dancing LTL2BA and SPOT executables that you can invoke. Best of all, they can both be made to output (semi-?) standard LBT format. To give you an idea, the formula G p3 | (F p0 & F p1) results in the following Büchi automaton (yes, that’s what it’s called) which can be used to verify it at run-time.

buechi

This was something I’ve been considering for Pandamator, as well, to express constraints for whole object lifetimes. As I found out, it was not a novel idea, but maybe I’d have the honor of implementing it first in an accessible way. It would need some elaborate machinery: abstract away all logic that is not LTL (formulas to translate have simple propositional variables), invoke an external program, read back the automaton and fill back in the actual logic conditions. That would replace my hodge-podge filters and selectors with something having real logical backing. The problem is, trying to write the darned LTL constraints proved to be not evident!

For example, how would the following be written? After action p0, action p2 cannot appear unless condition p1 appears between them. My guess would be p0 -> X G ((!p1 & !p2) U (p1 U p2)). However, the automaton does not seem to be suitable.

buechi2

This is my current situation… Maybe an intermediate solution would be to formulate an automaton directly, and use it as the specification of the constraint.

I won’t pretend I know the answer. I’m in the dark, but I’m determined to see this through. And, in the meantime, my lowly Prolog program can actually chug along and produce useful results…

Strong Typing? It’s not really strong enough…

Maybe it’s just age, but I find myself slipping into being less and less tolerant of weak type-systems. Which may come as a surprise, given my involvement in TinyScheme, but I don’t see it as a contradiction. Scripting is a different kind of work than writing multi-module, complicated software, and it’s nice to be able to use a language that can express the Y combinator for the former. For the latter, though, I am becoming more and more autocratic. I want to deprive my programs from all freedom to do as they please. I frown when I see mission-critical software written in Python. I turn away when I see customer-facing software written in Perl. And I growl when I see big programs written in PHP. I’d lose sleep if I thought there were a place in my code ready to do something silly when I least expect it, “silly” being a synonym for “something a typechecker would catch right away”. And I’m not talking about preconditions, postconditions and assertions, because these will still take action at run-time. Although Bertrand Meyer’s Eiffel bringing them forth was very revolutionary, twenty-so years ago, given that they are not mainstream concepts even now. What I’m talking about, is incorporating more programmer intent into the type-system proper.

I want to introduce you to F* (F-Star) which you will find familiar-looking if you’re coming from F# (if you’re coming from Haskell, have a look at Agda, which covers similar ground). You can play with it on Rise4Fun through a simple browser without going into the trouble of installing anything. If you’re into .Net in general, have a look at Code Contracts, they are now being incorporated into the development ecosystem, so maybe they’ll become mainstream. It uses the same verification engine as F* (Z3), and it’s available right now (it has Code Contracts examples on Rise4Fun, as well). But I’m going to follow the F*/Agda direction in this post.

F* adds types of types (called kinds), so type constructors are typed (kinded) themselves. For example, “list” is actually a type constructor of kind * => *.

type list :: * => * =
   | Nil : 'a::* -> list 'a
   | Cons : 'a::* -> 'a -> list 'a -> list 'a

This is just a new point of view on usual functional notions. Here comes a new thing: term-indexed (or dependent) types. The following type is the length-indexed list, of kind int => * => *, meaning that it constructs a *-kinded type from an int-typed term and a *-kinded type.

type ilist :: int => * => * =
   | INil  : ilist 0 'a
   | ICons : x:'a -> n:int -> ilist n 'a -> ilist (n + 1) 'a

The real fun begins when you create functions of dependent types, like the following function to append two length-indexed lists (the Rise4Fun tutorial has these examples and more).

val iappend: n1:int -> ilist n1 'a -> n2:int -> ilist n2 'a -> ilist (n1 + n2) 'a
let rec iappend n1 l1 n2 l2 = match l1 with
   | INil -> l2
   | ICons x n1_minus_1 l1tl ->
     ICons x (n1_minus_1 + n2) (iappend n1_minus_1 l1tl n2 l2)

Something more is evident in the function signature, compared to that of a usual appending function. You can actually read, in the signature, the constraint that the length of the list this function produces is the sum of the lengths of its two list arguments. This is just a minimal example, and I’m afraid the full power of term-indexed types will not be evident to you from the F* tutorial alone. Material on Agda has more pointers to how on can embed propositional logic in dependent types. I hope you’re as excited as me with this prospect, which could sort-of merge what declarative constraints (like those on Code Contracts) can be expressed in the type-system itself, and functions will actually prove that what they’re doing has the intended properties, purely by doing type-correct operations!

Please go and play with the F* tutorial. It will not help you in your everyday job (but if you do have a job where F* can help you, please be kind enough to not tell me…), but it will open your mind to things that will come. When I was reading the Gentle Introduction to Haskell twenty years ago, I did not know that I’d be actually using F# and closures in C# twenty years hence.

But be prepared for a rough ride, because you won’t have the support you enjoy in the mainstream. For example, while trying to write the following function to map a length-indexed list to another, I stumbled on type errors, which, of course, is the idea.

val imap: n:int -> ('a -> 'b) -> ilist n 'a -> ilist n 'b
let rec imap (n:int) (f:'a->'b) (l1:ilist n 'a) = match l1 with
   | INil -> INil
   | ICons x n1 l1tl -> ICons (f x) n1 (imap n1 f l1tl)

Even so, getting errors like the following is not really helpful.

input(13,45-14,0) : Error : Expected an expression of type:

ILIST1.ilist x_11_2 'x_10_1

but got (x_13_2): ILIST1.ilist x_13_1 'x_10_1

Type checking failed: ILIST1

Well, that’s why it’s called the “bleeding edge”…

What was the number of X during that month?

It has probably happened to you, too. A non-technical person comes with specifications for a monthly report, in the form of a bullet-list of column descriptions. All of them seem to be counts of various events but… wait! There is one column that says “Active Customers”, or something to that effect. This post is not how to explain to non-technical people that additive counts and counts that can only be calculated at a particular instant do not mix, or how to arrive at a compromise and agree to a particular instant to calculate those quantities, or how to keep clarifying each month what these numbers mean because they will keep forgetting it. Even though these topics might have been of interest, because one must deal with them at one point or another, this post is more ambitious. I will forgo all these matters and push this question to its limit: what is the real meaning of “Number of X during that month”, if one were to be so cruel as to skip all negotiations and produce a report of exactly that. For simplicity, I will use the “AdventureWorks” sample database as common ground.

The object of discourse will be “number of open WorkOrders”. What could the “number of open WorkOrders” across time possibly mean? The number of open WorkOrders (which are open between their StartDate up until their EndDate) is a constantly shifting number. At a granularity of a day (this is the actual granularity of timestamps in this table), this number possibly changes every day. To describe what the real answer to this question is, I will now introduce some terminology from the field of Temporal Databases and I will use concepts and software from my open-source project in that field, which is Pandamator. You have to use at least the HEAD revision at the time of writing this post, which is rev. 44, because grouping queries without GROUP BY were not handled in earlier releases.

Relations that carry with them validity intervals, like the WorkOrder, are commonly called state relations, in contrast to the relations one uses in a regular relational database, which actually encode the current state of the world and are called snapshot relations.You can think of a state relation as encoding a snapshot relation across time. Or you can think of a snapshot relation as being what a state relation encodes for a particular point in time. This dualism is the basis of  sequenced queries, which give you an answer at every instant in time (i.e. a state relation). Even though there are a small subset of the temporal queries one might like to ask, it is a useful subset that has been researched in depth and actually formed the basis of the proposed temporal additions to the SQL Standard (SQL/Temporal).

Pandamator supports sequenced queries (actually just them, for now), so let’s see what this count looks like, as a sequenced query. In fact, there are a number of steps before one can produce an answer, which I document here just in case one might want to follow them on one’s own:

  • Pandamator is a bit inflexible as to the naming of the validity period columns, so one has to rename the original columns of the WorkOrder table and make sure validity intervals are not NULL. For simplicity, and because I did not plan to run any update statements on the table, I just added two calculated columns with the expected names: ValidFromDate and ValidToDate (the latter being case when [EndDate] IS NULL then '3000-01-01' else [EndDate] end).
  • Using the Query Express for Pandamator (branched off the Query Express project), one has to add temporal support to the AdventureWorks database (there’s a button for that).
  • One then has to declare the WorkOrder table as a state relation, with the following command:
alter table Production.WorkOrder add validtime(STATE);

We are now ready to count WorkOrders. The query is:

validtime select count(*) from Production.WorkOrder;

Running this query produces a state relation with the result:

AdventureWorksCountQuery

It does take more than half a minute on my machine but, then again, what it computes in not so simple, when translated into plain SQL (you can see it in the “Messages” tab):

WITH spanboundaries(TheDate) AS (
(SELECT ValidFromDate AS TheDate
FROM Production.WorkOrder)
union
(SELECT ValidToDate AS TheDate
FROM Production.WorkOrder)
),
spans(ValidFromDate,ValidToDate) AS (
SELECT a.TheDate AS ValidFromDate,b.TheDate AS ValidToDate
FROM spanboundaries AS a
INNER JOIN spanboundaries AS b ON a.TheDate<b.TheDate and not exists(SELECT 0
FROM spanboundaries AS c
WHERE a.TheDate<c.TheDate and c.TheDate<b.TheDate)
),
R(t0,ValidFromDate,ValidToDate) AS (
SELECT count(*)AS t0,spans.ValidFromDate AS ValidFromDate,spans.ValidToDate AS ValidToDate
FROM spans
INNER JOIN Production.WorkOrder ON WorkOrder.ValidFromDate<=spans.ValidFromDate and spans.ValidToDate<=WorkOrder.ValidToDate
GROUP BY spans.ValidFromDate,spans.ValidToDate
),
CoalescedAux_R(t0,ValidFromDate,ValidToDate) AS (
(SELECT p2.t0 AS t0,p2.ValidFromDate AS ValidFromDate,p2.ValidToDate AS ValidToDate
FROM R AS p2
WHERE not exists(SELECT 0
FROM R AS p1
WHERE(p1.t0=p2.t0 or coalesce(p1.t0,p2.t0) is null)and p1.ValidToDate = p2.ValidFromDate))
union all
(SELECT p2.t0 AS t0,p1.ValidFromDate AS ValidFromDate,p2.ValidToDate AS ValidToDate
FROM CoalescedAux_R AS p1
INNER JOIN R AS p2 ON p1.ValidToDate = p2.ValidFromDate and(p1.t0=p2.t0 or coalesce(p1.t0,p2.t0) is null))
),
Coalesced_R(t0,ValidFromDate,ValidToDate) AS (
SELECT p1.t0 AS t0,p1.ValidFromDate AS ValidFromDate,max(p1.ValidToDate)AS ValidToDate
FROM CoalescedAux_R AS p1
GROUP BY t0,ValidFromDate
)
SELECT Coalesced_R.t0 AS t0,Coalesced_R.ValidFromDate AS ValidFromDate,Coalesced_R.ValidToDate AS ValidToDate
FROM Coalesced_R
ORDER BY ValidFromDate,ValidToDate OPTION(MAXRECURSION 0);

Pretty scary, huh? You can read about the translation in the docs but, let’s go on and verify some of the values, to make sure this complicated query actually does what it promises.
Here are three queries to produce the numbers during the last three validity periods depicted:

SELECT count(*) from Production.WorkOrder
where not (EndDate <= '2004-07-13' or StartDate >='2004-07-14')

SELECT count(*) from Production.WorkOrder
where not (EndDate <= '2004-07-14' or StartDate >='2004-07-16')

SELECT count(*) from Production.WorkOrder
where not (EndDate <= '2004-07-16' or StartDate >='2004-07-18')

Spend some time verifying the condition, to convince yourself that it does capture open WorkOrders. The numbers produced for these three periods are, indeed, 394, 210 and 138.

Next time one comes up with a report across time that asks what is essentially a snapshot question, instead of explaining the fallacy, you can take some comfort with the mental image of you, throwing the results of a sequenced query in their lap and leaving them to drown. I guess it suits them well!

APL, J, NIAL and Array Theory

When I was in France, I was doing a stage with Jean-Louis Laurière in a company that dealt with payroll software and services. The idea was to express all payroll logic in a rule-based system, and he had enrolled a senior person, very knowledgeable on the subject, to explain some aspect of it over lunch. That person was a very experienced COBOL programmer and, at one point, after he had explained some calculation using nested loops, I tried to summarize:

– So, this is the Min of x and the Max of y and z.

I got a blank stare and later, when we were alone with Jean-Louis, he said he almost burst into laughter at that moment, because it was obvious he did not understand what I was saying. Mind you, he knew perfectly well how to calculate the number in question procedurally, but he lacked the concepts to express it in the succinct way I was using. He was using a lower level of abstraction, and, since there seems to be a limit on the “chunks” we can hold in our heads, the higher the level of abstraction we use, the further away our minds can reach.

Those of you that share my fascination with Functional Programming probably like it for the same reason: you can express so darned more if you use a higher level of abstraction (higher order functions, in this case). All the better, if modern implementations can perform so well that you can use this abstraction for real-life computations. This post is not about Functional Programming. There are zillion sources to read about that. This post is about what higher abstractions might look like on the realm of array programming and, in particular, APL-like languages. In another post, I might write about the three languages that contested for the HPCS.

One thing you should understand while reading about those languages is that the APL branch of languages spawed away from the conventional trunk very early. I have found a copy of “A Programming Language” and I guess that a Fortran programmer reading it at the time must have felt like someone having found the manual for the Flux Capacitor in the Middle Ages. It’s a safe guess, because I can safely imagine a Java programmer of today feeling the same, since the concepts it is based on are as much distant from Java as they are from Fortran. In fact, the Fortran programmer probably has an edge, since Fortran is ultimately concerned with manipulating array data in the first place. I hope I have tickled your fancy enough to spend some time researching APL, but I’m going to propose an easier route: study J instead, which has a free implementation, can be written in ASCII (you must surely be wondering what APL is written in, if not ASCII) and you can find a very good introductory text from the conventional viewpoint: “J for C Programmers” (can be found in the J Books). This book is very important for the outsider, because it appears that all other sources present J like the most natural thing on earth which only your limited brainpower perceives as different. Sorry, no “J for Functional Programmers” exists, but this is no big deal as array programming has no different treatment in functional languages that in conventional ones. Although you will surely have little voices in your mind whispering about higher order functions while you’ll be reading about J adverbs. Curiously, a presentation of array programming that seems more close to the language of the functional programmer can be found in material for NIAL (download Papers.zip and don’t go to the root of the site), which I did not know but read about in the APL F.A.Q. Array Theory seems like a good theoretical underpinning for such languages.

Rule #34 of the internet supposedly says that, if something exists, there’s porn for it. Here’s some J porn for you. I used this as a programming test once believing that, someone who can learn enough J to answer this, can learn anything else needed on the job. I was not let down.

Given a 9×9 array of integers from 1 to 9, forming the supposed solution to a Sudoku puzzle (http://en.wikipedia.org/wiki/Sudoku), write a J verb (program) that computes whether the 9×9 array is, indeed, a solution according the following rules for Sudoku: Every line, every row and every 3×3 tile (3×3 tiles are arranged in a 3×3 arrangement to tile the entire array, see Wikipedia article) should contain every number from 1 to 9 one unique time.

The solution should use verb ranks to partition the array in all the needed ways (cells), and determine uniqueness by applying the same test verb in the appropriate manner.

And this is my own solution to that.

test3 =: monad define
                items =. ,y
                (9=#~.items) *. *./(items>0),(items<10)
)

is_sudoku3 =: monad define
                *./ (test3"_1 y) , (test3"_1 |: y) , (,(3 3 ,: 3 3) test3;._3 y)
)

Update: At the time I devised this exercise I could not find any relevant article, but I just now found a full Sudoku solver in J at the J Wiki.

The Fallacy of OLTP

OLTP systems are commonly thought of as antithetical to Batch Processing systems, the two occupying opposite sides of a chasm. OLTP is epitomized by the Web Application paradigm (which, itself, is the modern-day equivalent to the time-sharing paradigm of days past), where Batch Processing is ostracized to long-running nightly jobs.

I am going to present a case that the two are synergistic, based on how to fulfill the requirements of an on-line system, rather than on aphorisms and conventional wisdom.

Requirements of an On-Line System

An on-line system is supposed to give an immediate small answer to a user’s small request. The question of amount of data involved is important, as it pushes the design envelope towards minimizing latency and not throughput. A Batch Processing system, on the other hand, works with large amounts of data but does not have hard limits on response time on any particular part of the data. We are perfectly happy to wait for two hours to process five million rows, not caring that not a single one of them will be processed before any other.

An on-line system is also thought of as a synchronous, blocking one, although this is not a hard rule and, in fact, modern Web applications commonly use background synchronous AJAX or background asynchronous AJAX and background AJAX polling to present a better user experience. That’s why the characteristic quality I’m going to use is the transaction-at-a-time nature of processing.

Latency in the Real World

At the Goldilocks zone, a request-response system does minimize latency, as it time-shares the resources of the system to all concurrent users at once. We get R requests/s, achieving response time T on the average. Now, let’s see what happens closer to the breaking point for a data-bound system.

Database-bound systems rely on connection pooling to throttle data operations and this is done because, when the star of RDBMSs started rising, it didn’t take long for people to realize that too many concurrent data operations essentially meant none at all. So, a real system either does connection pooling, or should have been doing so, and I’m assuming the presence of connection pooling in the sequel (in fact, certain programming environments, like ASP.NET, implement connection pooling natively, albeit with default settings which might not be optimal).

How many are too many concurrent data operations? There’s no golden road to this answer. The real answer is “the number you measure after you test for it”. A rule of thumb, however, would be “far fewer than you think”[1].

In any case, there’s a turning point at a specific number, either because you have reached the connection pool size, or because you start thrashing your RDBMS, in which case you’d better decrease the size of your connection pool accordingly. The turning point, with R increasing, is when T suddenly rises because, either way, the requests cannot be serviced immediately. And this is when you need a paradigm shift to move forward.

Moving into the Gray Area

Once you reach a state where a number of would-be concurrent requests need to wait, and as their number starts to dominate over the requests that are being serviced, then you’re in the gray area between an on-line and a batch system. You have a number of pending requests that need to be serviced, and the minimal latency constraint has already been compromised. It’s time to take advantage of the RDBMS, a piece of software which is, what I call, essentially an SIMD machine.

Running SQL on single rows at a time is a costly operation, whose cost we gladly pay when we get back immediate responses and low latency, but the hard fact is that running essentially the same SQL on multiple rows at a time scales with a factor of a lot less than one[2].

To give an example, in an experiment I ran in January, transaction-at-a-time code (containing a lot of logic, which amounted to many tens of SQL statements), which was being invoked in a parallel fashion, was translated to batch processing and tuned to an optimal size (of 200, in this case). The threaded transaction-at-a-time code achieved 20 items per second and raised the CPU usage to over 40%. The batch code achieved 95 items per second and raised the CPU usage to less than 25%.

This was a particular case, and any other case would need particular experimentation and tuning to find the sweet spot. However, I think I can make a general statement that, after the throughput limit of transaction-at-a-time systems is breached, switching to batch processing can increase throughput and decrease latency back to the comfort zone.

No Free Lunch

Taking advantage of this does not come free. There is no automatic way to write the logic once and invoke it both ways, a lot less so if a lot of the logic is in usual application code, which is written at the single object level, unless you happen to use J (http://www.jsoftware.com/ ) or other non-mainstream languages, where the array is the natural granularity level and the individual object is a special case.

At great pain, one can adapt application code in conventional languages to always pass arrays around, and SQL code to accept whole relations (e.g. SQL Server’s table-valued parameters to stored procedures). Then one can invoke the code with data sizes of one or higher, according to current workload, and have it both ways. Or, implement two different versions of everything and switch between them.

In any case, this is not for the faint at heart, and needs senior-level guidance of everyone involved, as well as the mentality to make no assumptions and measure everything before arriving at conclusions.

SqlIsAnSIMDLanguage


[1] Recent experience of mine, in a queue-servicing data-bound application: dropping the thread-pool size from 200 to 10 quintupled the processing rate.

[2] Quoting from : http://technet.microsoft.com/en-us/library/hh393551.aspxIn some cases, batched background processing can be more efficient than singleton transactions. Generally, batched transactions use fewer resources per unit of work because the overhead associated with transaction initiation and termination is amortized over a larger number of transactions. When using batched background processing, incorporate batch sizes that are consistent with other concurrent operations. For example, work on a few thousand rows at a time to mitigate lock escalation (for example, to table locks).

Efficient queue polling on a database table

I know that database tables are supposed to be bad for queuing. Common advice is to use a messaging system, or asynchronous notifications, or both. But suppose for a moment that you have 24/7 queues that are rarely empty, so you mostly don’t need to worry about unneeded polling, and you just need a lot of things to happen constantly. Let’s play Devil’s Advocate for a moment and see if using a database for the task can be made less atrocious than it is supposed to be.

The standard features of a table used for polling are:

  • a sequential ID (or timestamp)
  • a “Status” column to distinguish processed from unprocessed entries
  • the payload data

The properties I seek for polling are:

  • cacheable and lightweight execution plan
  • “locking”, i.e. to avoid retrieving the same row twice
  • mutual exclusion between different pollers, if they exist
  • minimal blocking between different pollers

An approach like the following (T-SQL) satisfies the aforementioned properties. A new “Status” is introduced, to denote entries under processing. It is supposed that the “Status” of entries will be set to “processed” as the processing of each entry finishes, or reverted to “unprocessed” otherwise. An index on the “Status” column and the ID column (in that order) is also needed.

with T as ( 
	SELECT Q.*, ROW_NUMBER() over (order by id) RN FROM TheQueue Q
	WHERE TheStatus = 0
) 
UPDATE TheQueue with (rowlock)  
SET TheStatus=9, TmProcess=getdate() 
OUTPUT deleted.id, deleted.payload
WHERE id in (select id from T where RN<=@N)

If the processing of entries itself involves the database, don’t forget to limit the number of concurrent operations to as low as can be measured to perform efficiently. I’ve seen queues with 900 concurrent threads (I’m not making this up…) which sped up many orders of magnitude when the number of threads was reduced to… five. Notice also the setting of the “TmProcess” column, supposed to hold the timestamp the entry was dequeued. Add a column to register when the entry was enqueued and another one for when its processing was finished, and you can keep track of very useful analytics for your queues. I may return with another post on this particular subject.