LtlChecker: A spin-off of Pandamator

In the course of playing around with code to run the Büchi automata for the LTL constraint I’m adding to Pandamator, I realized that all that could be packaged into a nifty little feature for checking object evolution (or evolution of whole systems, for that matter). Enter LtlChecker, a library to help you apply the power of LTL to your imperative programs. Its only dependency is SPOT.fs from the Pandamator project, whose job is to execute SPOT and parse the automaton it produces. And SPOT itself, of course. I have took the easy way out regarding how SPOT is found: it must be in the path. Please note that cygwin1.dll must be on the path as well, for SPOT to be invoked.

SPOT.fs can be found in the SVN repository and, in any case, I have described it in the relevant post. I will focus on the new code, LtlChecker.fs. In an attempt to make it more C#-friendly, I have used Dictionary instead of Map.

Here’s a simple usage scenario, with which I will demonstrate how it works and I’ll present the implementation later.

type _test_struct = {
    mutable x: int
}

let _unit_test () =
    let obj = { new _test_struct with x=0 }
    let dic = new Dictionary<string,System.Func<bool>> ()
    dic.Add("p0",(fun () -> obj.x > 0))
    dic.Add("p1",(fun () -> obj.x > 0))
    let con = create_constraint "G(p0 -> G p1)" dic
    let i0 = init con
    obj.x <- 1
    let i1 = next i0
    obj.x <- 0
    let i2 = next i1
    obj.x <- 1
    let i2b = next i1
    (i0,i1,i2,i2b)

I have actually traced two diverging paths, for the constraint G(p0 -> G p1), where p0 and p1 both mean “property x of the test object is > 0”. The constraint can be translated as “once x becomes > 0, it never becomes zero again”.

Calling create_constraint creates an LtlConstraint, which is used to initialize the constraint checking. Calling init on it, evaluates propositions at the initial state and gives you back a ConstraintResult, which is what you’ll be examining throughout the constraint checking. The relevant property you should be looking at is status, an enumeration that tells you if the constraint has failed (Failed), if the system is at a state where it can be terminated with the constraint being fulfilled (CanTerminate), or if the system cannot be terminated now because the constraint awaits fulfillment (CannotTerminate).

Let’s look at it in practice. Here’s what i0 looks like.

{currentstates = [{state = 0;
                      accepting = true;}];
    status = CanTerminate;
    ltlconstraint =
     {automaton =
       {acccondnum = 1;
        states =
         [|{initial = true;
            transitions =
             [{endstate = 1;
               acccond = [0];
               guard = Bin (And,Prop "p1",Prop "p0");};
              {endstate = 0;
               acccond = [0];
               guard = Not (Prop "p0");}];};
           {initial = false;
            transitions = [{endstate = 1;
                            acccond = [0];
                            guard = Prop "p1";}];}|];};
      propositions =
       dict
         [("p0",
           System.Func`1[System.Boolean] {Method = Boolean Invoke();
                                          Target = LtlChecker+_unit_test@87;});
          ("p1",
           System.Func`1[System.Boolean]
             {Method = Boolean Invoke();
              Target = LtlChecker+_unit_test@88-1;})];};}

Status is CanTerminate, because x is still 0. In i1, status is still CanTerminate, and the current state is now state 1.

In i2, where x has gone to being 0 again, the status is Failed! That’s what one would expect.

In i2b, which is an alternative path from i1 but with x continuing to be bigger than 0, the status remains CanTerminate.

That’s is. A small facility to apply temporal logic in the evolution of mutable systems! In fact, you can apply it to any timeseries.

Nota Bene: I cannot guarantee that my interpretation of how running a Büchi automaton is done, is correct. It might happen that what works for the LTL expression above, might not work for some other expression. If I find problems (and manage to overcome them), I will update the code on SVN.

Here’s the code itself (it can also be found in the SVN repository, in project LtlChecker.

module LtlChecker

open System.Collections.Generic
open SPOT

type LtlConstraint = {
    automaton: BuchiAutomaton;
    propositions: Dictionary<string,System.Func<bool>>;
}

type ConstraintStatus = Failed | CanTerminate | CannotTerminate

type ConstraintState = {
    state: int;
    accepting: bool;
}

type ConstraintResult = {
    currentstates: ConstraintState list;
    status: ConstraintStatus;
    ltlconstraint: LtlConstraint;
}

let rec execute_logexp (propositions:Dictionary<string,System.Func<bool>>) (exp:LogicExpr) =
    match exp with
    | Const v -> v
    | Not e -> not (execute_logexp propositions e)
    | Bin (o, x, y) ->
        let lx = execute_logexp propositions x
        let ly = execute_logexp propositions y
        match o with
        | And -> lx && ly
        | Or -> lx || ly
        | Impl -> execute_logexp propositions (Bin(Or,Not(x),y))
        | Equiv -> execute_logexp propositions (Bin(Or,Bin(And,x,y),Bin(And,Not(x),Not(y)))) // (x and y) or (not x or not y)
        | Xor -> execute_logexp propositions (Bin(And,Bin(Or,x,y),Not(Bin(And,x,y)))) // (x or y) and not(x and y)
    | Prop p ->
        let rc = propositions.[p]
        rc.Invoke()

let create_constraint (ltl:string) (propositions:Dictionary<string,System.Func<bool>>) =
    let ba = ltl2ba ("'"+ltl+"'")
    { new LtlConstraint
        with automaton = ba
        and propositions = propositions
    }

let do_transitions (state:int) (con:LtlConstraint) =
    let states = con.automaton.states.[state].transitions
                        |> List.filter (fun t -> execute_logexp con.propositions t.guard)
                        |> List.map (fun t -> { new ConstraintState with state=t.endstate and accepting=t.acccond.Length>0 }) //Finite paths cannot work with more than one accepting condition
    states

let states_to_result (nextstates:ConstraintState list) (con:LtlConstraint) =
    let status =
        if nextstates.Length = 0 then
            Failed
        elif nextstates |> List.exists (fun s -> s.accepting) then
            CanTerminate
        else
            CannotTerminate

    { new ConstraintResult
        with currentstates=nextstates
        and ltlconstraint=con
        and status=status
    }

let init (con:LtlConstraint) =
    let initialstate = con.automaton.states |> Array.findIndex (fun st -> st.initial)
    let nextstates = do_transitions initialstate con
    states_to_result nextstates con

let next (res:ConstraintResult) =
    let nextstates = res.currentstates |> List.map (fun s -> do_transitions s.state res.ltlconstraint) |> List.fold (@) []
    states_to_result nextstates res.ltlconstraint

New Table Constraint Planned for Pandamator, part 2

As I promised in the previous relevant post, I am moving forward with the LTL constraint for Pandamator (as any of you who have been following the iteration on Flying Donut already know). Apart from some integration in the maintenance scripts, all that remains is actually running the Büchi automaton. Which, of course, is not an easy task to do in SQL… That’s why I’m going to stick to the positive side only, and give you an account of what I have done so far.

The actual syntax of the constraint is very close to what I wrote earlier (I used WITH instead of WHERE because it’s already used as a keyword for defining things).

alter table Production.Product 
add validtime constraint weightneverdec 
check 'G(p0 -> G p1)' 
with p0 = weight > 0, p1 = weight > 0

This is a simple constraint for testing, which means that, whenever Product.weight becomes bigger than zero, it never becomes zero again.

SPOT happily produces the following automaton for that expression.

weightneverdec

2 1t
0 1
1 0 -1 & p1 p0
0 0 -1 ! p0
-1
1 0
1 0 -1 p1
-1

The automaton is parsed internally to the following value.

val it : BuchiAutomaton =
  {acccondnum = 1;
   states =
    [|{initial = true;
       transitions =
        [{endstate = 1;
          acccond = [0];
          guard = Bin (And,Prop "p1",Prop "p0");}; {endstate = 0;
                                                    acccond = [0];
                                                    guard = Not (Prop "p0");}];};
      {initial = false;
       transitions = [{endstate = 1;
                       acccond = [0];
                       guard = Prop "p1";}];}|];}

Almost the full definition for LBTT is followed by my (simple) parser, except I assume that states are named as consecutive numbers starting at zero, because SPOT does that.

The abstract syntax tree for the automaton is defined by the following types.

type LogicOp = And | Or | Impl | Xor | Equiv

type LogicExpr =
    | Prop of string
    | Const of bool
    | Not of LogicExpr
    | Bin of LogicOp * LogicExpr * LogicExpr

type BuchiAutomatonTransition = {
    endstate: int;
    acccond: int list;
    guard: LogicExpr;
}

type BuchiAutomatonState = {
    initial: bool
    transitions: BuchiAutomatonTransition list;
}

type BuchiAutomaton = {
    acccondnum: int;
    states: BuchiAutomatonState[];
}

A simple translation layer transforms the propositional logic of the guards, together with the definition of the propositions in the constraint, to SQL.

let rec logexp2stringlist (rcl:PropositionDefinition list) (exp:LogicExpr) =
    match exp with
    | Const true -> ["(1=1)"]
    | Const false -> ["(1=0)"]
    | Not e -> ["NOT";"("] @ (logexp2stringlist rcl e) @ [")"]
    | Bin (o, x, y) -> 
        let lx = logexp2stringlist rcl x
        let ly = logexp2stringlist rcl y
        match o with
        | And -> ["(";"("] @ lx @ [")";"and";"("] @ ly @ [")";")"]
        | Or -> ["(";"("] @ lx @ [")";"or";"("] @ ly @ [")";")"]
        | Impl -> logexp2stringlist rcl (Bin(Or,Not(x),y))
        | Equiv -> logexp2stringlist rcl (Bin(Or,Bin(And,x,y),Bin(And,Not(x),Not(y)))) // (x and y) or (not x or not y)
        | Xor -> logexp2stringlist rcl (Bin(And,Bin(Or,x,y),Not(Bin(And,x,y)))) // (x or y) and not(x and y)
    | Prop p -> 
        let rc = rcl |> List.find (fun rc -> rc.name = p)
        "(" :: rc.expression :: [")"]

During ALTER VALIDTIME execution, a view is created to abstract away the exact expressions used for the propositions (as boolean numbers 0/1, since SQL does not recognize logical expressions outside WHERE).

CREATE VIEW [Production].[Product_weightneverdec_GuardView] WITH SCHEMABINDING as
SELECT
 ProductID,
 ValidToDate,
 row_number() over (partition by ProductID order by ValidToDate) AS VersionOrdinal,
 case when ( ( ( weight > 0 ) ) and ( ( weight > 0 ) ) ) then 1 else 0 end AS g0_0,
 case when NOT ( ( weight > 0 ) ) then 1 else 0 end AS g0_1,
 case when ( weight > 0 ) then 1 else 0 end AS g1_0
FROM [Production].[Product]

That’s it. I wish I had done more, but I believe I made good use of my scant free time to do as much as possible. Expressed like that, the view can be used to perform (non-deterministic, i.e. parallel…) runs of the automaton on the actual object histories. Stay tuned, I hope my next post on the subject will be a triumphant conclusion to creating the constraint.

“Lenient Parsing” and why I regret it

Don’t look it up — “Lenient Parsing” is a term of my own. When writing Pandamator, I wanted to make the point that the query transformations, I was using, did not care which were the SQL expressions used for SELECT, WHERE etc (except when they did, read footnote). Which is one of my good/bad facets: some time I like to push an idea to see how far it goes. In this case it went very far, because I could ignore the content of all expressions and ignore subtleties of TransactSQL syntax and typesystem.

“Lenient parsing” was the way I made the point. It consists of not parsing at all. I just amass all tokens and declare it an expression. The side effect was that I had to delimit all statements by semicolons, which I rather preferred. (I actually did that for statements, as well, in an effort to read whole scripts and just ignore anything that was not temporal statements. I have since aimed a little lower.) Anyway, I made my point, proved that I could get away with minimal syntactic transformations, and now “Lenient Parsing” (how on earth I came up with this weird term, I don’t know) gets in my way.

For one thing, I have been wanting to implement something I call (buckle up, because another weird term of my own invention is coming) “assisted nonsequenced queries”, which will be nonsequenced queries with an implicit “join” with the “spans” of the query, the latter being the largest contiguous validity periods that do not overlap with any validity period of any object of any relation. Within each such “span”, nothing changes, so these are like extended instants (you can read more on that in “SQL Extension for Interval Data”, Lorentzos, Mitsopoulos). This would allow one to write something akin to IXSQL, but then one would have to perform arithmetic on time, and not having real expressions to manipulate is an impediment.

Secondly, I’ve been wanting to  write an extension to SQL ever since I started doing a lot of SQL work. It’s a good thing that I didn’t, though, because what I would have written at the time would have been what I thought I needed for better programming under TransactSQL (my name for that was SQL DoneRight, so that gives you the idea). However, my point of view has changed, and I have become convinced that the way to go is to avoid the procedural, Basic-like language altogether, and go for pure SQL, working on whole relations at once and abolishing scalars. I hope you’re intrigued, but I’m not going to go into details just yet, as I’m preparing a “manifesto” blog post for that. I’m just going to remind you of my post about how to play to the strengths of an RDBMS and avoid shoehorning it to procedural, object-at-a-time computation. This extension will have a full typesystem, which necessitates handling real expressions.

In conclusion, “Lenient Parsing” was a great idea, but it’s now time to let it go, and switch to real parsing of expressions, even if that means losing some obscure part of Transact SQL grammar in the proceedings.

Footnote: Inner subqueries, of course, had to be treated specially than other elementary expressions.

New table constraint planned for Pandamator (Constraints on workflow paths, part 3)

Building upon the thoughts in part 1 and part 2 of this series of posts, I’m planning a new table constraint for Pandamator.

First of all, I have created Pandamator as a public project in the Flying Donut tool. So you’ll be able to keep tags on me while I’m moving Pandamator forward. Once you’re at it, you can register for free, and use it for your own projects. Don’t worry if you are not an expert Donut Flyer, a sample project will be created for you to play with upon your registration. I have great hopes for this tool and I have actually invested for its development. So I’m putting my money where my mouth is, by using it for Pandamator!

You’ll find an iteration (SCRUM terminology) for the new table constraint, there, so I’m not going to explain breakdown into items and tasks. I’m going to give you the outline of how I’m planning it to work. It’s the first big addition that diverges from ATSQL. Here’s what I’m thinking it should look like.

ALTER TABLE T ADD VALIDTIME CONSTRAINT C CHECK('[] p1 -> <> p2')
WHERE p1 = AMOUNT > 100, p2 = POSITION > 2

It’s a variation of the CHECK constraint, where the constraint is not a logical expression, but a string. This is done so that the constraint itself will be whatever SPOT accepts (LTL and PSL), without interfering with SQL grammar. The connection to SQL grammar will be in the WHERE clause that follows this novel CHECK constraint, which will provide the meaning of the propositions, inside LTL/PSL, as SQL expressions on the object in question. I settled on that as a solution to ensure separation of concerns and a smooth transition between the two domains. SPOT will do its job, and create the Büchi automaton, and Pandamator will generate proper SQL for the propositional logic formulas within, by plugging in the propositions defined in the constraint. Before that, I toyed with grandiose ideas of extending the SQL grammar to add LTL/PSL, but the solution I’m proposing here seems a lot cleaner, in the end.

I’ll post an update when this progresses a bit, and you can keep track of it yourselves at the relevant iteration.

What’s in a Validity Interval?

What’s in a Validity Interval? Could a Validity Interval be any different, and would it smell any different? Well, the short answer is: yes. But the long answer is longer than you might think, and to furnish that, I have to start at the very beginning.

First of all, the Validity Interval is just a method of coercing a relational database, handling plain relational data, to represent sequenced data. By using a Validity Interval, you are actually using a particular encoding to achieve that. If you like a fascinating read, check out some of the articles (and the book) by Nikos Lorentzos, on a way to think about temporal data (actually, any data having intervals of any kind) that does not even need to step away from the pure relational model.

Having settled on using Validity Intervals, there are still choices to be made. Let’s divert a bit and visit TSQL2 (spec), which is like the Latin of sequenced query languages. TSQL2 has evolved while its proponents were pushing for its inclusion in SQL3, but in all incarnations it had syntax to specify the granularity of time. For example, in its latest incarnation as the proposed SQL/Temporal part of SQL3, the following statement would transform a regular relation to a sequenced relation with valid-time support with a granularity of a day.

ALTER TABLE salary ADD VALIDTIME DAY

Presumably, since granularity can be defined per relation, one could mix different granularities in a database, although I never found an example of that in the TSQL2 material I have looked at. Which makes one think that referencing relations of different granularities, in the same query, would force it to adopt the finest granularity of all. That thought and the fact that I needed a granularity of minutes, for the application I was considering, made me treat time at the finest granularity possible. Unfortunately, for Sql Server 2005, which I was using at the time, that was the granularity of the DATETIME datatype, which is… fuzzy. Because of the way DATETIMEs are encoded, the granularity is 3 – 4 ms (“Rounded to increments of .000, .003, or .007 seconds”). And, to be frank, as an engineer it was more natural to think of time as a continuous, rather than discrete, quantity.

It didn’t matter a lot, except it sort-of forced my hand during the next choice, which was between closed-closed ([a, b]) and closed-open ([a, b)) validity intervals. Not being able to succinctly compute the next instant function, made it more compelling to use the closed-open interval. There are also other advantages, explained also in “Developing Time-Oriented Database Applications in SQL” (Richard T. Snodgrass, available here), since closed-open interval arithmetic is a little easier. You can easily test if intervals meet end-to-end: [a, b) meets [b, c) and if you concatenate them you get [a, c).

And then there’s the issue of how to represent the end of the last validity period of the versions of an object. There are actually two ways: avoid the issue altogether, by keeping two separate tables for the historical portion and the current portion of the data, which complicates all queries that need to refer to both, and choosing some value to use. Or none, because NULL is not really a value, but would be an obvious choice, as it usually stands for “don’t care”, “don’t know”, “not applicable”, or all of the above. However, trying to use NULL in actual calculations makes ORs pop up left and right, which is detrimental to already heavy execution plans. So, I took the way proposed by R. Snodgrass and used a marker value of 3000-01-01, which has the property of being able to be used in interval arithmetic, because it’s a value that falls after everyday dates. The end of the validity of an object, unless it is deleted from one valid time onwards, is “until the end of time”, and 3000-01-01 plays that role effectively (as a specific value, it is proposed by R. Snodgrass as the largest round date that can be represented in all RDBMSs he had looked at).

Up to now, you must have realized the (innocent) trick I played in the previous posts (2013-04-16 and 2013-05-14) to avoid introducing complications that were not essential to the points I wanted to make: the ends of the validity periods I defined for the AdventureWorks tables, should have actually been one day in the future, since AdventureWorks defines closed-closed intervals at a granularity of a day. I hope you will forgive me…

Sequenced Foreign Key

Last time I wrote about temporal databases, I had used a mostly-unaltered copy of the well-known AdventureWorks database sample. I thought it was best to base any material I present on that, because it is widespread and does not need much introduction. I see now that I will not be able to stick to that much longer because, as far as temporal data are concerned, AdventureWorks is quite limited out of the box. However, I do not have the time right now to prepare any other sample of significant size, so I will continue using it, for the time being, and explain where it is lacking. And one very crucial thing it has that gets in the way of demonstrating temporal databases is autonumbering columns.

Cue the concept of a sequenced Primary Key. Like the Primary Key in a conventional database (called a Snapshot database when viewed from within a temporal setting), a sequenced Primary Key defines the identity of the data items (notice I don’t call them rows), and is unique among all items — but this time, it unique for each moment in time. Which means that, in the general case, a single data item can change through time and actually be represented  by several rows of the relation (and that’s why I did not call items rows to begin with). Autonumbering columns (IDENTITY columns, as they’re called in Transact SQL) are a decent way to provide automatic surrogate keys in snapshot relations, but they make it impossible to use the table as-is to hold multiple versions of an item, which means I either have to start twiddling more with the AdventureWorks schema (and end up having to create a migration script), or skip the sequenced Primary Key for a while and jump directly to the sequenced Foreign Key.

The sequenced Foreign Key, as you have probably already guessed, is a reference from one relation to another that is defined for each moment in time. And this means that, time being as vast as it is, one data item (let’s call them objects to simplify things), one object in the first relation can reference different objects from the second in different moments in time. But, just like in its snapshot equivalent, a sequenced foreign key can only reference sequenced primary keys that exist for the moments of time involved.

Transposing it to my mostly-unaltered AdventureWorks database lacks a bit of generality, because each object is only defined for a single validity period. Trust me, having multiple, non-coinciding (and possibly non-contiguous) validity intervals in both relations makes everything a little more exciting (read as: vastly more complicated), but the essence of the sequenced Foreign Key will shine through, even with this example.

For this example, I’ll “upgrade” the Foreign Key which exists from the Production.WorkOrder relation to the Production.Product relation. I chose to use SellStartDate to SellEndDate as the validity period of the product. It does not mean that the creators of the AdventureWorks schema had that interpretation in mind, but this interpretation of the validity period can be used to make my point (and there are no other columns in the table, anyway). After adding the validity period columns to Production.Product and making it a State relation, like I did to the Production.WorkOrder in my previous post on Temporal databases, I declare the integrity constraints. I actually had to modify Pandamator to allow that, because it used to ignore IDENTITY columns altogether, since they were antithetical to the presence of real sequenced relations.

alter table Production.Product add validtime constraint PK primary key(ProductID);

alter table Production.WorkOrder add validtime constraint PK primary key(WorkOrderID);

alter table Production.WorkOrder add validtime constraint FK_Product foreign key(ProductID) references Production.Product on delete restrict;

After declaring them, I create the temporal artifacts.

alter validtime

Of interest, is procedure Production.ChkFK_Product_WorkOrder_To_Product, which exists to be used automatically in the case of modifications that can violate the Foreign Key. Since my computed columns for the validity intervals are not updateable, you can’t actually try to update anything (Sql Server is smart enough to decline running the update to begin with), but you can trigger the check if you do a dummy update in Sql Management Studio.

UPDATE Production.Product SET Name = Name WHERE ProductID = -1

Just something to invoke the trigger. You’ll get back an error like the following.

Msg 50000, Level 16, State 2, Procedure TR_FK_Product_WorkOrder_To_Product_R, Line 15
Transaction violates sequenced referential constraint for FK_Product(RESTRICT) Production.WorkOrder to Production.Product, e.g. Production.WorkOrder.WorkOrderID=72587 ValidFromDate="Jul  3 2004 12:00AM" ValidToDate="Jul 14 2004 12:00AM"
Msg 3609, Level 16, State 1, Line 1
The transaction ended in the trigger. The batch has been aborted.

What this says, is that the foreign key is violated (it is actually invalid to begin with, as Pandamator does not currently check integrity constraints at the time of the alter validtime command) and an example of the violation is a WorkOrder with WorkOrderId=72587 between July 3rd, 2004 and July 14th, 2004, which is the whole validity period for that WorkOrder, if you check it. Which makes sense, since the Product with ProductId=804, is only valid between July 1st, 2002 and June 30th, 2003! We have uncovered inconsistencies between the validity intervals in Products and WorkOrders of the AdventureWorks database! (Don’t take that too literally, as the interpretations I’m using do not actually exist in the original database).

In a different article, I may go into more depth into what the sequenced Foreign Key update rules mean (no, there are no delete rules in Pandamator, as I don’t currently allow updating the sequenced Primary Key). I also promise to spend some time treating how the validity interval works in Pandamator, and how I’ve not been completely honest in interpreting the AdventureWorks datetime columns as validity endpoints.