Thursday, July 17, 2008

Disk Payload Management

Transfer of data has an upper bound of the speed of light and a lower bound of amount of a budget, excluding strange action at a distance and physics not yet known. It's all fun and games until something divides by zero.

In a delightful teaser article, Neil J. Gunther's "The Guerrilla Manual" delivers a bolus of refreshing views on capacity planning and performance management with a cleansing amount of terse common sense.

In particular, he notes, "You never remove the bottleneck, you just shuffle the deck."

Network Effects and Blinkenlights

Back in the mid 1980s, at least one large financial institution allocated IT budgets using a simple ratio of numbers of customer accounts by type, with appropriate finagle factors. At least it was a model that, assuming a lot of linearity, had simplicity and apparent transparency going for it.
Of course, these were the times of data centers with big boxes, and the occasional minicomputer. The unit costs of processing, networks, and storage were significant vis a vis cycles or bits or bytes per dollar and cycles per watt.

Of course, also, the use cases for the technology moved rather slowly, with occasional punctuation with growing online inquiry from, say, customer service agents or the addition of Automatic Teller Machines to the CICS olio of the big iron code.

More gadgets and new approaches to programming by the end users (unclean!!!) resulted in rather surprising effects upon infrastructure through rampant flaming queries (what did he say?) and even complete suites of large scale computing systems dedicated to new types of models. In the case of financial services, one big dude jammed with APL for determination of fixed income dynamics. APL, for those who don't recall, was developed for passive aggressive savants who didn't want management looking into what they'd written. But, with letting the punishment fit the crime, APL rocked for matrix operations and was a darling of the early generation of quants, including those laugh a minute actuaries.

Somewhere, someplace, someone is hacking FPGAs to stick into the Beowulf cluster of X Boxes. I gotta feeling.

So where were we... Oh, so the point is that the common factor around these early instances of "end user" computing involved moderate and increasing network effects. Transactional data could be used as feeds to these user managed systems, and network effects with emphasis upon storage and I/O tuning became significant as a means of moving the bottleneck back to the cpu. Now pick another card.

The disk to disk discussion comprises several use cases, ranging from performance optimization (e.g, put the top 10 movies on the edge of the network) to business continuance to the meta issue of secure transfer and "lockup" of the data. Problem is, how does one deal with this mess which embraces Service Oriented Architectures and Halo dynamism?

Intelligent Payloads?

This problem of placing data and copies of data in "good enough" sites on the network seems encumbered by how these data are tagged in such a way as to inform the "system" itself on the history of the atomic piece of interest as it transits other systems and networks. Perhaps something that appends usage information to the information itself, rather like appending travel stickers to an old steamer trunk tracing its owner's tours of Alice Springs, Kenosha, and Banff.
And no, I'm not advocating still another inband system monitor... more MIBs than MIPS and all of that problem.

This could, I believe, be a fertile area for new types of automation that begin to apply optimization (satisficing, most likely, in the sense of "good enough" strategies, see Herbert Simon for more G2) thereby, maybe (he qualifies again!) to reduce the amount of time and money spent upon forensics and weird extraction of information needed to govern surprisingly fluid dynamic systems.

Zipf's Law (think top 10 lists, 80/20 rule, The Long Tail issues, etc.) and other power law behaviors will still apply to the end product of such analysis, but perhaps the informed payloads will ease the analysts' management of these turbulent parcels. (Some insights to the framing of the problem of getting top level insight into systems structures and how they express emergent behaviors can be found at the Santa Fe Institute and their many papers on "Small World" problems.)

So, the bounds on this problem of course reduces to time and money. That topic also is taken up by Gunther, with emphasis upon what some of my old gang at the Wall Street joint referred to as "the giggle test" for feasibility.

This is a brief piece about an intriguing problem where more insight can be gained from Operations Research methodologies than from Information Technology praxis per se.
It nets out to (sorry) not only if it is not measured, it isn't managed, but add to that the cautionary insight of "if it isn't modeled, it isn't managed."