What’s in a Validity Interval?

What’s in a Validity Interval? Could a Validity Interval be any different, and would it smell any different? Well, the short answer is: yes. But the long answer is longer than you might think, and to furnish that, I have to start at the very beginning.

First of all, the Validity Interval is just a method of coercing a relational database, handling plain relational data, to represent sequenced data. By using a Validity Interval, you are actually using a particular encoding to achieve that. If you like a fascinating read, check out some of the articles (and the book) by Nikos Lorentzos, on a way to think about temporal data (actually, any data having intervals of any kind) that does not even need to step away from the pure relational model.

Having settled on using Validity Intervals, there are still choices to be made. Let’s divert a bit and visit TSQL2 (spec), which is like the Latin of sequenced query languages. TSQL2 has evolved while its proponents were pushing for its inclusion in SQL3, but in all incarnations it had syntax to specify the granularity of time. For example, in its latest incarnation as the proposed SQL/Temporal part of SQL3, the following statement would transform a regular relation to a sequenced relation with valid-time support with a granularity of a day.

ALTER TABLE salary ADD VALIDTIME DAY

Presumably, since granularity can be defined per relation, one could mix different granularities in a database, although I never found an example of that in the TSQL2 material I have looked at. Which makes one think that referencing relations of different granularities, in the same query, would force it to adopt the finest granularity of all. That thought and the fact that I needed a granularity of minutes, for the application I was considering, made me treat time at the finest granularity possible. Unfortunately, for Sql Server 2005, which I was using at the time, that was the granularity of the DATETIME datatype, which is… fuzzy. Because of the way DATETIMEs are encoded, the granularity is 3 – 4 ms (“Rounded to increments of .000, .003, or .007 seconds”). And, to be frank, as an engineer it was more natural to think of time as a continuous, rather than discrete, quantity.

It didn’t matter a lot, except it sort-of forced my hand during the next choice, which was between closed-closed ([a, b]) and closed-open ([a, b)) validity intervals. Not being able to succinctly compute the next instant function, made it more compelling to use the closed-open interval. There are also other advantages, explained also in “Developing Time-Oriented Database Applications in SQL” (Richard T. Snodgrass, available here), since closed-open interval arithmetic is a little easier. You can easily test if intervals meet end-to-end: [a, b) meets [b, c) and if you concatenate them you get [a, c).

And then there’s the issue of how to represent the end of the last validity period of the versions of an object. There are actually two ways: avoid the issue altogether, by keeping two separate tables for the historical portion and the current portion of the data, which complicates all queries that need to refer to both, and choosing some value to use. Or none, because NULL is not really a value, but would be an obvious choice, as it usually stands for “don’t care”, “don’t know”, “not applicable”, or all of the above. However, trying to use NULL in actual calculations makes ORs pop up left and right, which is detrimental to already heavy execution plans. So, I took the way proposed by R. Snodgrass and used a marker value of 3000-01-01, which has the property of being able to be used in interval arithmetic, because it’s a value that falls after everyday dates. The end of the validity of an object, unless it is deleted from one valid time onwards, is “until the end of time”, and 3000-01-01 plays that role effectively (as a specific value, it is proposed by R. Snodgrass as the largest round date that can be represented in all RDBMSs he had looked at).

Up to now, you must have realized the (innocent) trick I played in the previous posts (2013-04-16 and 2013-05-14) to avoid introducing complications that were not essential to the points I wanted to make: the ends of the validity periods I defined for the AdventureWorks tables, should have actually been one day in the future, since AdventureWorks defines closed-closed intervals at a granularity of a day. I hope you will forgive me…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s