Sunday, July 13, 2014

Proposing Powerful Portable Pipelines

Proposing Powerful Portable Pipelines

IBM calls the package CMS/TSO Pipelines. A certain group of us affectionately call it "Hartmann Pipes". It's standard equipment on VM/CMS, but a lot of people upgrade to the author's version. On TSO (part of z/OS), it's optional. Those who learn even the simplest of its capabilities tend to get really hooked on it.

When you're excited about something, you talk about it. But discussing Hartmann Pipelines with the uninitiated is difficult. Most people have seen command line pipes (Unix, Linux, even Windows) so they wonder "what's the big deal?". How is this implementation different from other pipelining schemes? Aye, there's the rub. I'll get into that detail later. And some readers will need that rationale before any of this makes sense. First ... the proposal.

There are two audiences. There are those who don't know CMS Pipelines. I gotta sell you on this wonderful thing. There are those who do know CMS Pipelines. I gotta sell you on this particular implementation. It includes a compromise; I gotta sell you on the compromise. Maybe from the two groups there will be enough volunteers to achieve critical mass.

The Proposal

There have been several ports of pipelines. Prior efforts require one or more substantial support environments. The idea here is to use pure C. Interaction between the stages is summarily handled by these functions ...

  • output(p,d,l) /* blocks until record is consumed */
  • peekto(p,b,l) /* examines a record without consuming */
  • readto(p,b,l) /* consumes a record (and gets contents) */

output() is used by a producer stage. The producer is blocked until the record is consumed. p is a pipe struct pointer, referencing a connection, in this case an output connector. d points to the data, but it's not necessarily a NULL terminated character string. There are no sacred characters. l says how long the record is.

peekto() and readto() are used by a consumer stage. They're identical except that peekto() leaves the record pending, and leaves the producer stage blocked. p is a similar pipe struct pointer to that used for output(), except that it references an input connector. b points to a buffer and l indicates its size.

These are easily implemented on any POSIX system using a pair of traditional pipes. We'll have one Unix style pipe pointed "downstream" sending data and stats, and another pointed "upstream" sending control signals. Some of the control signals are ...

  • tell me about the record
  • give me the data, used by both peekto() and readto()
  • unblock, "got it", producer can move along, used by readto()

The producer accepts these controls via the upstream pipe and sends meta data or data on via downstream pipe. The producer blocks until it gets a "got it" signal from the consumer. The consumer blocks until the producer has a record to send.

The "connector" struct starts with two file descriptors ...

struct PIPECONN {
    int  ctrl;    /* control from consumer to producer */
    int  data;    /* data from producer to consumer */

    ... other elements ...

The data pipe alternately carries metadata based on signals on the ctrl pipe.

The Compromise

The above operations are under control of the host operating system. Some time back, I mentioned this idea to the Piper (Hartmann himself). His take on it was not clear. CMS Pipelines has its own dispatcher. (CMS originally did not have built-in multi-tasking.) I've mentioned the idea to several serious plumbers. They unanimously reject it, citing Hartmann's dispatcher as an essential requirement. But is it?

About the time I first published this article, David Craig shared his excursion into Plan 9 (that's a whole nutha journal entry) including discussion about Plan 9 security. I found this ...

"... weak but easy-to-use security can be more effective
than strong but difficult-to-use security if it is more likely to be used."

The point goes for more than just security. A weak (or not well tuned) but easy-to-use dispatcher can be more effective if it leads to a simpler implementation. So that's to the Pipes knowing audience. Let each stage be a unique process.

Now a word to the uninitiated.

A Record by Any Other Name

Hartmann Pipes convey records. "Eww ... such a mainframe concept." Well, yes, in fact. John Hartmann invented Pipelines on a mainframe and CMS/TSO Pipes runs on two mainframe operating systems.

Why do we fight over these things? If it were called a packet or a message or perhaps a block then you wouldn't have a problem with it. I won't enumerate the reasons for using quantized data (records) leaving examples such as MQ and TCP as ample justification.

The value of the Hartmann implementation includes ...

  • record structure (if any) is retained
  • "delaying the record" can be tolerated or avoided, as needed
  • an entire record can be examined before it is consumed, which allows for in-flight changes to the pipeline based on content
  • a stage can be added, to input or output, between two connected stages
  • multiple streams can run concurrently, and fan-in or fan-out
  • output can be looped back to input

We could really use this on systems other than only CMS and TSO.

-- R: <><

Originally published 2014-July-13.