The Fallacy of OLTP

OLTP systems are commonly thought of as antithetical to Batch Processing systems, the two occupying opposite sides of a chasm. OLTP is epitomized by the Web Application paradigm (which, itself, is the modern-day equivalent to the time-sharing paradigm of days past), where Batch Processing is ostracized to long-running nightly jobs.

I am going to present a case that the two are synergistic, based on how to fulfill the requirements of an on-line system, rather than on aphorisms and conventional wisdom.

Requirements of an On-Line System

An on-line system is supposed to give an immediate small answer to a user’s small request. The question of amount of data involved is important, as it pushes the design envelope towards minimizing latency and not throughput. A Batch Processing system, on the other hand, works with large amounts of data but does not have hard limits on response time on any particular part of the data. We are perfectly happy to wait for two hours to process five million rows, not caring that not a single one of them will be processed before any other.

An on-line system is also thought of as a synchronous, blocking one, although this is not a hard rule and, in fact, modern Web applications commonly use background synchronous AJAX or background asynchronous AJAX and background AJAX polling to present a better user experience. That’s why the characteristic quality I’m going to use is the transaction-at-a-time nature of processing.

Latency in the Real World

At the Goldilocks zone, a request-response system does minimize latency, as it time-shares the resources of the system to all concurrent users at once. We get R requests/s, achieving response time T on the average. Now, let’s see what happens closer to the breaking point for a data-bound system.

Database-bound systems rely on connection pooling to throttle data operations and this is done because, when the star of RDBMSs started rising, it didn’t take long for people to realize that too many concurrent data operations essentially meant none at all. So, a real system either does connection pooling, or should have been doing so, and I’m assuming the presence of connection pooling in the sequel (in fact, certain programming environments, like ASP.NET, implement connection pooling natively, albeit with default settings which might not be optimal).

How many are too many concurrent data operations? There’s no golden road to this answer. The real answer is “the number you measure after you test for it”. A rule of thumb, however, would be “far fewer than you think”[1].

In any case, there’s a turning point at a specific number, either because you have reached the connection pool size, or because you start thrashing your RDBMS, in which case you’d better decrease the size of your connection pool accordingly. The turning point, with R increasing, is when T suddenly rises because, either way, the requests cannot be serviced immediately. And this is when you need a paradigm shift to move forward.

Moving into the Gray Area

Once you reach a state where a number of would-be concurrent requests need to wait, and as their number starts to dominate over the requests that are being serviced, then you’re in the gray area between an on-line and a batch system. You have a number of pending requests that need to be serviced, and the minimal latency constraint has already been compromised. It’s time to take advantage of the RDBMS, a piece of software which is, what I call, essentially an SIMD machine.

Running SQL on single rows at a time is a costly operation, whose cost we gladly pay when we get back immediate responses and low latency, but the hard fact is that running essentially the same SQL on multiple rows at a time scales with a factor of a lot less than one[2].

To give an example, in an experiment I ran in January, transaction-at-a-time code (containing a lot of logic, which amounted to many tens of SQL statements), which was being invoked in a parallel fashion, was translated to batch processing and tuned to an optimal size (of 200, in this case). The threaded transaction-at-a-time code achieved 20 items per second and raised the CPU usage to over 40%. The batch code achieved 95 items per second and raised the CPU usage to less than 25%.

This was a particular case, and any other case would need particular experimentation and tuning to find the sweet spot. However, I think I can make a general statement that, after the throughput limit of transaction-at-a-time systems is breached, switching to batch processing can increase throughput and decrease latency back to the comfort zone.

No Free Lunch

Taking advantage of this does not come free. There is no automatic way to write the logic once and invoke it both ways, a lot less so if a lot of the logic is in usual application code, which is written at the single object level, unless you happen to use J (http://www.jsoftware.com/ ) or other non-mainstream languages, where the array is the natural granularity level and the individual object is a special case.

At great pain, one can adapt application code in conventional languages to always pass arrays around, and SQL code to accept whole relations (e.g. SQL Server’s table-valued parameters to stored procedures). Then one can invoke the code with data sizes of one or higher, according to current workload, and have it both ways. Or, implement two different versions of everything and switch between them.

In any case, this is not for the faint at heart, and needs senior-level guidance of everyone involved, as well as the mentality to make no assumptions and measure everything before arriving at conclusions.

SqlIsAnSIMDLanguage


[1] Recent experience of mine, in a queue-servicing data-bound application: dropping the thread-pool size from 200 to 10 quintupled the processing rate.

[2] Quoting from : http://technet.microsoft.com/en-us/library/hh393551.aspxIn some cases, batched background processing can be more efficient than singleton transactions. Generally, batched transactions use fewer resources per unit of work because the overhead associated with transaction initiation and termination is amortized over a larger number of transactions. When using batched background processing, incorporate batch sizes that are consistent with other concurrent operations. For example, work on a few thousand rows at a time to mitigate lock escalation (for example, to table locks).

Advertisements

2 thoughts on “The Fallacy of OLTP

  1. Pingback: “Lenient Parsing” and why I regret it | dsouflis

  2. Pingback: Higher-Order SQL | dsouflis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s