by Paul R. Martin
In this essay I will describe eleven modest technical contributions I made during my 31-year career in the computing industry. I must emphasize "modest". I am no Steve Jobs or Stephen Wolfram. My contributions are trivial by comparison with theirs. Mine are known to only very few people. They will soon be lost forever unless I write them down. Plus it will be fun for me to reminisce a little, hence this essay.
There are two aspects of the computing industry that make it hard for outsiders to understand what goes on in the industry. First is the dynamic rate of technical change and the second is the abstract nature of the work.
The dynamic rate of change, as expressed by Moore's Law, is that the power and capacity per dollar of computer systems doubles every 18 months. This doubling rate has been going on for half a dozen decades now and it shows no signs of slowing any time soon. What this means is that new possibilities for exploitation continually present themselves. Problems that at one time were impossible, not only get solved, but the solutions soon become routine and shortly after become obsolete. Expertise has a short lifespan. Terminology continues to change and become obsolete. Computer technologists must keep running just to keep even with the changes. As a result, looking back on them now, my modest contributions seem almost trivial, even though a few of them were solutions to problems that were deemed impossible at the time.
The second aspect, that of the abstract nature of the work, means that computer professionals rarely produce anything they can show to anyone else. They can't really show their designs, or flowcharts, or code, or usually not even the outputs of their programs, to their wives or friends in order to show them what they do for a living. It's not at all like Architecture, where the architect can point to a building to show his or her work.
To drive home this last point, I'll digress for the moment and tell you an anecdote. My son Dave, who is now the president of his own thriving computer services company, learned to program at my knee, so to speak, on our Apple II computer when he was just a teenager. I had just upgraded the machine from 4K of memory to the relatively "giant" 32K. Dave designed, developed, and debugged a program to keep track of the sprint ratings of the members of his swim team. This involved the data entry of the times of the various members of his team for the various events from meet to meet, for the various strokes and the various distances. His program organized and stored the data, figured out things like rankings and trends, and it produced reports showing the results. I was proud of his work and so was he. When he first got clean output and a nice report, it just so happened that his coach was over at our house. Dave was justifiably excited to show his coach what he had accomplished and he ran upstairs to the kitchen where his coach was talking to my wife and me.
I'll never forget, and I'm sure Dave won't either, the coach's response after he had looked at Dave's machine printed sprint rating results. He looked at Dave, gave him a faint smile, then he looked up at the ceiling momentarily, and then he asked my wife, "Have you always had those ceiling lights?" To him, that sprint rating sheet didn't look much different from the ones he was used to that were written on a typewriter so it had almost no impact on him. He was totally oblivious to the work involved in developing the computer program necessary to produce it. And, it would be pointless to try to explain it to him unless he were genuinely interested.
So, I am not writing this essay to take anyone's mind off the latest fashion in indirect lighting fixtures. I am only setting this down for anyone who is mildly interested in some of the things I did during my computer career, for what it's worth.
When I started my career, programming for Boeing in 1963, my first programming assignment was to take over and complete a program being converted to run on an IBM 7080 computer. This was a "huge" multi-million dollar machine with an unheard of memory of 160K. That was enormous power and size for those days. And it was expensive. It cost about $400 to assemble a program on it, so we had restrictions on how many times we were allowed to reassemble without getting a management review. Between assemblies, we keypunched patch cards which we inserted into the object deck.
My program kept track of the way Minuteman Missiles were built. It started with the engineering specifications for how to build one, but then it also contained all the specific details for each individual missile, or "bird". The detailed information contained such things as serial numbers for parts that of course weren't in the engineering specs. The engineering specs, however, contained all the part numbers of the parts that went into a bird.
The major work of my program was to generate a "parts explosion". The engineering specs contained a hierarchy of part numbers which told which parts went into an assembly. Each assembly had its own part number and many assemblies went together to produce a higher level part which also had its own part number. The hierarchy had seven or eight levels all the way up to the final part, the bird itself, which had its own part number.
Many, if not most, parts contained multiple parts with the same part number. For example, at the lowest level, a part might contain 50 screws, each with the same part number. In the engineering specs, these parts would show up only once in the description of an assembly, with a field called Quantity Per Assembly (QPA). My program had to generate multiple records according to the QPA so that each individual occurrence of a part had its own record. Of course this meant that the generation of records "exploded" as the effects rippled down through the hierarchy. The files representing a particular bird thus contained many more records than did the engineering specs.
Since the only storage media available at the time were either punched cards or magnetic tape, these records were all stored on magnetic tape. Thus, the computer could only access the records in sequence. The hierarchical structure of the parts in a bird was exactly like that of an organization chart where everyone (except the CEO) has exactly one boss. The files were kept in a sequence exactly like the sequence of an indented outline, where for example Roman numerals might number the highest level, capital letters might identify the next level, Arabic numbers for the next, and so on. Each record on the file contained a field which told the level of that particular record so that the structure of the hierarchy was maintained.
"Exploding" the file meant producing as many output records as the QPA for each input record. But, they couldn't all be produced at once. After one of them was produced, a record for each of the parts below that one in the hierarchy had to be produced right behind it. Then the next one, with all its subordinate parts, could be produced and so on for the QPA. But you can see, that this same thing could happen at any level below, so a truly enormous quantity of records for a particular bird would result.
That was probably more detail than I should have gone into, but it was fun to reminisce, so please forgive me. Now let me start describing my specific contributions.
1. Used-on part numbers
One problem with the program that I inherited, and which had been running on much less capable computers, was that there was a field in each record called "Used On". That was supposed to contain the part number of the part on which this one is used. In other words, the part number of the assembly one level up. Well, the record for that part in the file might be many, many records earlier in the file. The intervening records would be for parts that were also used on that assembly, along with a lot of subordinate parts that were used on them, and so on.
So, when the printout was produced, there was a blank space for Used On Part Number that engineers would have to ink in by hand later throughout the whole document. (These documents were four or five inches thick!) I asked my supervisor why we didn't have the computer fill in those blanks. He told me that many smart people had looked at the problem and determined that it was impossible. It didn't seem that way to me and I told him so. He said, "If you think you can do it, go ahead."
I figured that the way to solve the problem was to use that enormous memory available to me to keep track of the part numbers at each level that have been processed above the one being processed and pick out the right one for each record as needed. It worked, and I amazed my boss.
2. Freeform notation
On the same project, the way things worked was that an army of engineers laboriously wrote out the original specifications by hand. These were then keypunched to form the input file for my program. The engineers also hand wrote the explosion for each bird. Their handwritten copy was the official document and it was this document that was formally accepted by the Air Force each time they actually bought a missile. In the meantime, during the months and years that the missile made its way through the production line, and was transported to a missile base and installed, the ever-changing document (both my machine produced one and the handwritten one) followed along for myriad uses. Mine was always several weeks more current than the engineer's because as soon as their update sheets were produced they were keypunched and after a night's run, the computer produced my document. The engineers, on the other hand, had to compile all their paper copies and then bind and publish them.
That delay became important at various stages of the process, particularly right up close to the Air Force approval process. The actual turnover of a bird was typically delayed several weeks as a result. That cost Boeing money and it was a big deal.
I asked my boss why, now that we could produce the used-on part numbers, didn't the Air Force accept our document as the official one and save those many weeks? He said there is one important difference between the documents that makes all the difference. That is that the engineers make many notations of explanations and exceptions on the sheets that become part of the official documentation. Again, it had been deemed impossible to produce these free form, ad hoc, comments on the machine produced document.
Once again, I told him that I could solve that problem and he said, "Go for it.". I simply invented a new record type that was not fielded, as records typically were in those days, and came up with procedures and instructions for keypunch as to how to record the handwritten comments. I simply printed these comments on the face of the document in the blank space.
The result, which followed a tense period during which the engineers tried in vain to find fatal differences between their document and mine, was that an army of engineers was let go and the Air Force accepted my document as the official one. I felt sort of bad, but that is the nature of the business.
When I joined IBM in 1964 I got in on the ground floor of the development of the System/360 operating system. My specialty was the "Supervisor" function of the operating system. The Supervisor controls and allocates the CPU resources of the processor itself and the memory. It also handled the interrupts (which gets into a little more technicality that I want right now). The individual programs competed for memory and for CPU time and the Supervisor granted and revoked access to them. The competitors for CPU time were called 'Tasks'. There might be many tasks in one running program and there might be many programs running in a single task. But it was the tasks that presented themselves to the Supervisor requesting time on the processor.
The structure of the tasks in the system was exactly like the structure of the Minuteman part number files, so I already had some expertise in how to deal with this type of tree structure. I saw the need to be able to create arbitrary structures of tasks within the system, both as a diagnostic to test the various functions of the Supervisor, and also to implement programs for which a multiple task structure would provide some benefit. So I designed and wrote such a program which would GENerate a TREE of tasks to any specification and then provide these tasks with the programs they would need to run in order to accomplish the objective. It was an early general purpose multi-tasking program written in 1965. An IBM paper was published describing it and it found some use within the IBM development community.
In the early 360 operating systems, it soon became mysterious as to what was actually going on inside the system when many programs and many tasks were running. The CPU had a "Wait Light" on the front panel that was on any time the CPU was not actually executing instructions. People would stare at that light. If it was on a lot, the question was, "What is it waiting for?" If it was out a lot, people would ask, "What is it so busy doing?" To help answer those questions, I designed and wrote a program, which Bernie Weinstein named for me. He called it the INternal System Information Gathering and Handling Technique, or INSIGHT. (As an aside, I'll never forget Bernie Weinstein. I had some extra money to invest and he advised me to buy Pan Am. After all, he explained, people will always be flying and if you buy Pan Am stock and forget it, it can do nothing but grow over the long term. Fortunately for me, circumstances forced me to sell my stock for roughly what I paid just before the stock collapsed to zero and Pan Am disappeared. I don't fault Bernie -- the decision was solely mine -- but I will never forget him.)
When you ran INSIGHT on the computer, it would continually change its priority, for competing for CPU time, to be the lowest of any task in the system. That way it would run only when the Wait Light would otherwise be on. It wouldn't affect the performance of anything else in the system except for the small amount of memory it took up. Each time INSIGHT was allowed to execute instructions, it would look around inside the system and take notes on what was going on. The major question it asked was, Why are all the other tasks waiting? Statistics were kept on this kind of information. Summaries of these statistics could be produced on demand by the operator, and they were reported when INSIGHT was shut down. The program was distributed pretty widely to early IBM customers to give them some much appreciated "insight" into what was going on in their enigmatic new systems. A description of the program was published as an IBM paper.
Throughout my career, computer memory was always one of the most precious and expensive resources. You never had enough and always wanted more. Naturally, when customers ran their systems, they wanted to know which programs and tasks are using up how much memory. Memory hogs and wastrels were not tolerated. But who were they? I designed and wrote a program which somebody (maybe Bernie again) helped me name Region And Nucleus General Examination, or RANGE. Individual user jobs ran their programs in what were called "Regions" in the operating system. These were areas of memory allocated to a particular job at a particular time. The Nucleus of the operating system also took up a bunch of memory which the customers felt like IBM was taking from their, the customers', precious memory. RANGE provided a dynamic picture of the use of memory by all the running jobs and by the system itself. This too was distributed and used in the field by customers to understand the operation of their systems better. And, it too, was written up and published in an IBM paper.
6. ACTT and ACTT II
As I said earlier, people used to stare at the Wait Light on the CPU and wonder about what was going on. It occurred to me that you could unplug the lamp and plug in some kind of electrical counter that would tally up the amount of time the light would have been on. I asked some hardware guys in our department and found out that they could easily build such a device and they eagerly did so. We called it the Available Compute Time Totalizer. The hardware guy and I presented a paper on the device at some kind of IBM meeting and I think a patent was applied for. I'm not sure where that went.
Some time later, the same hardware guy approached me and told me that he could build an electronic version of the electro-mechanical ACTT and asked me whether I would collaborate on what the logic of the machine should do. There were a lot more signals available on the CPU panels that we could take advantage of beside the Wait Light. I worked with him to design and build the next version, which we naturally called ACTT II.
7. Interrupt diagnostic
In 1968 I transferred with IBM from Poughkeepsie NY to Seattle. I started out on the Boeing team providing IBM Systems Engineering services to the same people I used to work with. Now they were trying to understand their new System/360s and I was in a great position to help them. After a year on the Boeing team, I joined a skunk works project led by John Maloof where we tried to modify a special program, written for a special computer (the model 67), so that it would run on a general purpose 360 computer. The plan was to use the 360 interrupt system in order to provide function that was specially provided in model 67. This was very tricky and it depended on a very precise knowledge of what happened exactly in the computer during the various interrupts. There were four or five general purpose models of the 360 at the time (models 30, 40, 50, 65, 75, and 95). We had access to the middle four on that list in order to do our testing. I soon discovered that there were differences among the models in the way they processed interrupts. These differences did not affect the operation of normal programs, but they sure had an effect on the program we were developing. Since the machines didn't behave precisely according to the specs, I contacted IBM engineers representing each of the models and told them about my discoveries. They were surprised and asked for details.
So, I wrote a program that observed and recorded an extreme low level of detail about what exactly goes on in the hardware under a variety of different conditions. This information was given to the engineers, some of which thanked me for finding hardware bugs. I don't know the extent, if any, to which this program was useful, but it was fun to do.
8. OC7 Catcher
In 1970 I joined the IBM team working with PNB (Pacific Northwest Bell, the forerunner of USWest and later Qwest). They were in the process of a massive conversion of the billing and collecting programs from an older generation of computers (the IBM 7074) to the new System/360. They were also converting the programs from the machine-specific 7074 Autocoder to COBOL.
As it turned out, the COBOL compilers at that time were very intolerant of numeric fields that didn't contain numbers. What would happen would be that the program would just "blow up" and give you an error code -- the dreaded '0C7' (pronounced 'Oh see seven'). Of course the programmers are supposed to edit the fields and not even attempt to add things that are not numbers, and in thousands of cases, they did it right. But during the debugging process, it was extremely annoying to have the program blow up with an 0C7 before it even got to the real test. It wasted a lot of time and was a super aggravation. They asked me for help.
I wrote a program, called The (later "Infamous") 0C7 Catcher. If a program called the 0C7 Catcher at the beginning, then if it blundered into non-numeric data later on, instead of stopping with an 0C7, my program would collect and store the diagnostic data, plug in some arbitrary good number in the bad field, and then let the program resume its execution. This turned out to be a godsend for the development process and it endeared me to a bunch of programmers.
Then, after the celebrations of the successful completion of the project and the conversion of the live system, an occasional 0C7 would blow up the nightly billing runs. This had a much more severe impact than merely inconveniencing a junior programmer. Instead, what happened is that the company-critical flow of millions of dollars in revenue was stopped cold offering nothing but an 0C7 code as consolation. The programmer would have to be called out of bed to come down and fix the problem in order to get the bills out.
After a few panics of this sort the boss approached me and said, "Why can't we use the 0C7 catcher to avoid this problem?"
"Well," I said, "it's because the program just makes up numbers to stick in bad fields, so you will be changing numbers representing real business quantities to arbitrary garbage numbers." He insisted. He asked me to modify the program so that sufficient diagnostic information was produced, and operating procedures be developed so that the programmer would still be called, but at least the program would continue to run and keep the process going. I added all the safeguards he wanted, wrote up the procedures, trained the operators and programmers, and the program was implemented in the live production runs.
In the early years, people followed procedures. Later, the 0C7s, became routine and shortcuts were taken, and later still, the incidents were ignored. I think many customers were billed using my phony numbers. I remember the decision finally being made to remove the 0C7 Catcher completely from all production programs and I remember the weeping and gnashing of programmer's teeth as a result. My program became infamous.
This was a fun project that I took on myself and which went nowhere. I did get a paper accepted with a "Published Rating" within IBM just to protect IBM's patent rights to the idea. The idea was to provide a Direct Access Retrieval Technique for data stored in a sequential file. Sequential files, particularly in the old days when that's all there was, are useful for processes that can run in that sequence. But in many cases, it would be nice to be able to access a record directly from that kind of file knowing only the key. What the DART program did was to generate an approximate address of the desired record, and then use the fact that the data is stored sequentially in order to do a final search to find it. (I know, I know, most people try to say "data are" because they seem to think that is more correct. I don't agree. I think 'data' is a word like 'bread' or 'sand' or 'snow'. We say "The bread is on the table", or "The snow is piled up." So data comes in bits. Bread comes in slices, sand comes in grains, and snow comes in flakes. I won't change.)
The key to the program is that the keys to the data be described as accurately as possible. What I mean is that the type of data that can be in each position of the key should be specified as accurately and as narrowly as possible. For example, if the key, or part of it contains a date, then the values that can appear in the month part of it can only be numbers from 1 to 12. For keys like part numbers, there may be standards that say that a certain position must contain an 'L' or an 'R' and no other value. Or it must be numeric, or it must be one of several letters, or whatever.
This specification is provided to DART and it is used to convert keys to numbers. The key conversion algorithm uses a variable radix. That is, the radix for converting the number may vary from position to position in the key. In normal mathematics, a radix is constant across the entire number. For example, in our ordinary base 10 system, the radix is 10 and it is 10 for every place position. The same with binary numbers; the radix is 2 for each position. The contribution of each place to the overall value of the number is the value of the digit in that place multiplied by the radix raised to the power of the place position minus one, counting place positions from right to left. That's a lot of words, but think of the number 123 in base 10. The value of the number, one hundred twenty three, is computed by taking 3, the value of the digit in the first place, times 10 (the radix) raised to the power zero ( 1, the place position, minus one is zero) to get 3. Then moving to the second place, we take 2, the digit we find there, and multiply it times 10 (the radix) raised to the power 1 = 2 -1, to get 20, and finally from the 3rd place we take the 1 and multiply it times 10 raised to the power 2 to get 100. Adding 100 + 20 + 3 gives us the number 123.
Anyway, it is customary for the radix to be the same for all positions in a normal number system. In the DART implementation, the radix is a constant for any particular position, but it may be different from position to position. In the example I used earlier, a position that can contain only an 'L' or an 'R' would have a radix of 2. If a position always has only a letter, the radix would be 26 (there are 26 letters in the alphabet).
It is fairly straightforward to see how a number is computed from its representation in this variable radix system. The value of the first position (on the right) is one. The value of the next position (to the left) is the radix value of the first position. In general, the value of any position of the key is the product of the previous (the next one on the right) radix value and the previous (the same next one on the right) place value. This is also true for standard number systems -- check it out for base ten or base two -- but since the radix is a single number in those cases, it is more convenient to express the place values in terms of powers of the radix.
Anyway, before DART can do anything useful, it must have access to the entire file. It makes a sequential pass through the file examining and converting each key as described above. Now, if you imagine plotting those values against the position, or address, of the record in the file, you will get some kind of always-increasing curve. Where the density of records in the key space is high, you will get a gentle slope; where the density is sparse, the curve will be steep. You provide a parameter to DART saying how much memory you can give it for storing the characteristics of the distribution of keys throughout the file. DART then builds the best polygonal curve fit it can to represent this distribution in the space you gave it. Think of it as having a limited number of pins to stick into that plotted curve so that when you connect consecutive pins with rubber bands, you get the closest approximation to the curve. You will most likely stick the pins into the corners where the slope of the curve changes abruptly.
Once this distribution has been captured, DART is in the position to be able to retrieve records. When presented with a key, DART converts it using the variable radix scheme. Then it finds the segment of the polygonal curve which contains this number. If it happens to hit a corner exactly, then DART knows the exact address of the record. If it falls between corners, than a linear interpolation is done to calculate the approximate address for the record. By retrieving the record at this address, DART knows which way to look from there and depending on the system, may do a serial or a modified binary search on the correct side to find the record.
I did implement DART and tested it against a couple data access methods that were then current and I satisfied myself that DART could compete favorably. But I never got any interest in the program that amounted to anything so it went nowhere. It was fun anyway.
While I was working with USWest virtual memory systems became the standard in IBM's big computers. In these systems, pages of data were assigned to locations in memory that would change over the course of the program's execution. If the page wasn't being used much, and there was a demand for memory by other programs, or even by this one, that page might be written out to a disk so that the memory could be used for another purpose, and then later, when the data from that page was needed again, it would be brought back in from disk. But by then, the old memory location was now being used for something else so the page would have to be reassigned to another memory location. These systems were characterized as paging systems.
The customers, in particular USWest, were concerned that the demand for memory might be high enough that it would cause a flurry of paging so that the efficiency of the computer would be affected. The computer might be doing nothing but paging and not getting any useful work done. They built benchmark jobs to test the performance of the system and correlate it to the paging rate. But to get any reasonable measurements, they needed to be able to control the background paging activity.
Randy Foldvik came to me with this problem. I wrote a program, which Randy later named PAGEM which would create an artificial demand for pages such that a specified paging rate for the system would be sustained. Randy's boss was skeptical and opined that that problem was too hard to solve. He predicted that the program would act like Galloping Gertie did over the Tacoma Narrows and go into uncontrollable oscillation. He said he had tried to do something like that and he was unable to solve the damping problem.
I might have gotten lucky, but I had anticipated the damping problem and I made some swags and used my intuition to implement routines that would notice change rates and thresholds and make corresponding adjustments in non-linear fashion to prevent over-reacting but yet be responsive enough in order to hold the paging rate steady at the target value. I distinctly remember the first live run. Randy and I were in the machine room with Ray Hitchcock. Ray was sort of in charge of the overall installation but this was his first real involvement in this particular project of measuring the effects of the paging rate. I asked him to pick a paging rate. He grinned and said, "Thirty pages per second". The operator typed '30' on the console and we looked at the system paging statistics. The paging rate was 30. Ray picked some other number, I forget what, that number was typed in, and the paging statistics immediately changed to that number. The program worked exactly as planned.
That was my last involvement with that program. It became the property of USWest as a result of a service contract and Randy ran with it from there, doing his work of keeping the systems in tune. It was another gratifying moment there at the system console seeing that my program did what at least one expert thought was impossible.
Following the inexorable track of Moore's Law, IBM systems kept getting bigger and faster. At some point, the increased speed was achieved in part by coupling two or more computers together in order to form one system. These were called multiprocessing systems. What it meant from an operating system viewpoint was that the Supervisor had more than one CPU that it could allocate to tasks competing for time. To the extent that there were typically many competing tasks, this was an effective way of speeding up the processing workload.
But if the workload consisted of a single big long-running CPU hog, having multiple processors available wouldn't be of much help. And, USWest had some big programs like that. It seemed to me that the problem could only get worse as the computers went from two to four and more CPUs coupled together. It seemed to me that a solution might be to provide a way for programs to be split up into many independent tasks so that they could effectively use multiple CPUs.
The 360 Operating System provided capabilities to do just that, but the COBOL compilers of the time didn't. I decided to design and write a program which would provide a suite of callable services to COBOL programs (or any program language that provides a CALL interface) so that programs can be designed to run as multiple tasks. I wrote the program as what IBM called a "Field Developed Program" (FDP). These were programs written by employees in the field (i.e. not in the development divisions) and which were officially made IBM products available to our customers. I never came up with a name for my program that people in the FDP department would accept so I ended up calling it DNL. This was short for the product number assigned to it which was 5740-DNL.
I didn't get any customers interested in the program, although I think a few licenses for it were sold in other parts of the country. But I did attract the interest of an IBM Fellow in Kingston NY. His name was Enrico Clementi and he had a staff of scientists working for him using the largest computers they could find to do quantum calculations. Enrico was a Quantum Chemist, but he had scientists of other flavors working with him. He wanted to take the programs that they were using and have me apply DNL to them in order to run them on IBM's newest multiprocessors. I went to Kingston several times to work with his people on what was the high point of my career. Not that I accomplished anything noteworthy, but the experience of working with a high powered group with that much talent and resources was totally beyond any other professional experience I had had. It was exhilarating. I tried to learn everything I could about Quantum Physics, how scientists work and think, and whatever else I could learn by attending various seminars with them on related disciplines. I had a great time.
As for DNL, it was a little humiliating. Unknown to me for quite a while, I had a sinister bug in the bowels of my program. It only caused trouble in cases that would occur one in ten billion, or ten trillion times. But Enrico's programs scared the bug up. I spent a majority of my time with Dr. John Detrich trying to isolate and diagnose the problem. After many days of intense work poring over diagnostic data, we began to suspect an algorithm I had written to control a queue of requests. Logically it was very similar to the drum used in short order restaurants. The waitresses and waiters clamp their orders to the drum and the short order cooks on the other side spin the drum and take the orders off and fill them. There may be many wait people adding orders and there may be many cooks working them off, but they need to be done in sequence, each one done only once, and none of them getting lost.
I had used a particular computer instruction (Compare and Swap) in the implementation of my algorithm. When Dr. Detrich learned that, he closed his eyes and slowly leaned back with his hands behind his head and said something like, "Hmmm. I think I remember something like that." Then he excused himself, left the office for a few moments, and came back thumbing through a rather thick journal. When he found the page, he said, "Yeah, I thought I had heard this before" and he handed the book to me.
There I saw the title of a paper which was something like "Managing a FIFO Queue Using Compare and Swap with no Locks". That was exactly what my algorithm was doing so I got VERY interested. I still remember the very first sentence of that paper. It said, "To date, there is no known solution to this problem."
Well, that was a defining moment in my career. It immediately changed my work plans. In short, I read the paper, learned what my bug was, learned that there was a solution to the special case of my implementation and the paper showed what that solution was, but I was nearly out of time. So instead of fixing the program right, I took John's suggestion and implemented software locks around the algorithm. This is a performance compromise which I didn't like, but it did remove the bug.
The project was over for me shortly after that and I didn't follow what happened in Enrico's lab much from then on. But it was a wonderful experience that I might write up in another essay. But for now, I just wanted to explain the program in the context of other programs I have written that have sort of defined part of my career. Thanks for reading all the way through this, if you did.
January 12, 2008
©2008 Paul R. Martin, All rights reserved.