Rebel Science News
11/28/2012
Jeff Hawkins Is Close to Something Big
 
8/26/2012
The Myth of the Bayesian Brain
 
8/23/2012
The Second Great AI Red Herring Chase
 
8/15/2012
Rebel Speech Recognition Theory
 
8/8/2012
Rebel Speech Update
 

The Silver Bullet:

Why Software Is Bad and What We Can Do to Fix It

 

 

 
Rebel Science Home
Why Software Is Bad
Project COSA
Operating System
Software Composition
Parallel QuickSort
The Devil's Advocate
COSA Discussion Forum
Not Associated with V.S. Merlot, Inc.
Contact Me

 

Abstract
Software Is Bad and Getting Worse
  The 'No Silver Bullet' Syndrome
  Ancient Paradigm
Why the Experts Are Wrong
  Turing's Baby
  A Fly in the Ointment
  The Hidden Nature of Computing
  Fateful Choice
  Emulation vs. Simulation 
  Turing's Monster
  Vested Interest
  There Is a Silver Bullet After All
  Deconstructing Brooks' Complexity Arguments
  Targeting the Wrong Complexity
  The Billion Dollar Question
The Silver Bullet
  Why Software Is Bad
  Why Hardware Is Good
  Programs as Communication Systems
  Event Dependencies and the Blind Code Problem
  The Cure for Blind Code
  Software Design vs. Hardware Design
  Thinking of Everything
  Plug-Compatible Components
  Event Ordering Is Critical
  Von Neumann Architecture
  Software IC's with a Twist
  Failure Localization
  Boosting Productivity
Conclusion
  Slaying the Werewolf
  Rotten at the Core
   

Abstract: There is something fundamentally wrong with the way we create software. Contrary to conventional wisdom, unreliability is not an essential characteristic of complex software programs. In this article, I will propose a silver bullet solution to the software reliability and productivity crisis. The solution will require a radical change in the way we program our computers. I will argue that the main reason that software is so unreliable and so hard to develop has to do with a custom that is as old as the computer: the practice of using the algorithm as the basis of software construction (*). I will argue further that moving to a signal-based, synchronous (**) software model will not only result in an improvement of several orders of magnitude in productivity, but also in programs that are guaranteed free of defects, regardless of their complexity.

Software Is Bad and Getting Worse

The 'No Silver Bullet' Syndrome

Not long ago, in an otherwise superb article [pdf] on the software reliability crisis published by MIT Technology Review, the author blamed the problem on everything from bad planning and business decisions to bad programmers. The proposed solution: bring in the lawyers. Not once did the article mention that the computer industry's fundamental approach to software construction might be flawed. The reason for this omission has to do in part with a highly influential paper that was published in 1987 by a now famous computer scientist named Frederick P. Brooks. In the paper, titled "No Silver Bullet--Essence and Accidents of Software Engineering", Dr. Brooks writes:

But, as we look to the horizon of a decade hence, we see no silver bullet. There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.
...

Not only are there no silver bullets now in view, the very nature of software makes it unlikely that there will be any--no inventions that will do for software productivity, reliability, and simplicity what electronics, transistors, and large-scale integration did for computer hardware.

No other paper in the annals of software engineering has had a more detrimental effect on humanity's efforts to find a solution to the software reliability crisis. Almost single-handedly, it succeeded in convincing the entire software development community that there is no hope in trying to find a solution. It is a rather unfortunate chapter in the history of programming. Untold billions of dollars and even human lives have been and will be wasted as a result.

When Brooks wrote his famous paper, he apparently did not realize that his arguments applied only to algorithmic complexity. Most people in the software engineering community wrongly assume that algorithmic software is the only possible type of software. Non-algorithmic or synchronous reactive software is similar to the signal-based model used in electronic circuits. It is, by its very nature, extremely stable and much easier to manage. This is evident in the amazing reliability of integrated circuits. See Targeting the Wrong Complexity below.

Calling in the lawyers and hiring more software experts schooled in an ancient paradigm will not solve the problem. It will only be costlier and, in the end, deadlier. The reason is threefold. First, the complexity and ubiquity of software continue to grow unabated. Second, the threat of lawsuits means that the cost of software development will skyrocket (lawyers, experts and trained engineers do not work for beans). Third, the incremental stop-gap measures offered by the experts are not designed to get to the heart of the problem. They are designed to provide short-term relief at the expense of keeping the experts employed. In the meantime, the crisis continues.

Ancient Paradigm

Why ancient paradigm? Because the root cause of the crisis is as old as Lady Ada Lovelace who invented the sequential stored program (or table of instructions) for Charles Babbage's analytical engine around 1842. Built out of gears and rotating shafts, the analytical engine was the first true general-purpose numerical computer, the ancestor of the modern electronic computer. But the idea of using a step by step procedure in a machine is at least as old as Jacquard's punched cards which were used to control the first automated loom in 1801. The Persian mathematician Muhammad ibn Mūsā al-Khwārizmī is credited for having invented the algorithm in 825 AD, as a problem solving method. The word algorithm derives from 'al-Khwārizmī .'

Why The Experts Are Wrong

Turing's Baby

Early computer scientists of the twentieth century were all trained mathematicians. They viewed the computer primarily as a tool with which to solve mathematical problems written in an algorithmic format. Indeed, the very name computer implies the ability to perform a calculation and return a result. Soon after the introduction of electronic computers in the 1950s, scientists fell in love with the ideas of famed British computer and artificial intelligence pioneer, Alan Turing. According to Turing, to be computable, a problem has to be executable on an abstract computer called the universal Turing machine (UTM). As everyone knows, a UTM (an infinitely long tape with a movable read/write head) is the quintessential algorithmic computer, a direct descendent of Lovelace's sequential stored program. It did not take long for the Turing computability model (TCM) to become the de facto religion of the entire computer industry.

A Fly in the Ointment

The UTM is a very powerful abstraction because it is perfectly suited to the automation of all sorts of serial tasks for problem solving. Lovelace and Babbage would have been delighted, but Turing's critics could argue that the UTM, being a sequential computer, cannot be used to simulate real-world problems which require multiple simultaneous computations. Turing's advocates could counter that the UTM is an idealized computer and, as such, can be imagined as having infinite read/write speed. The critics could then point out that, idealized or not, an infinitely fast computer introduces all sorts of logical/temporal headaches since all computations are performed simultaneously, making it unsuitable to inherently sequential problems. As the saying goes, you cannot have your cake and eat it too. At the very least, the TCM should have been extended to include both sequential and concurrent processes. However, having an infinite number of tapes and an infinite number of heads that can move from one tape to another would destroy the purity of the UTM ideal.

The Hidden Nature of Computing

The biggest problem with the UTM is not so much that it cannot be adapted to certain real-world parallel applications but that it hides the true nature of computing. Most students of computer science will recognize that a computer program is, in reality, a behaving machine (BM). That is to say, a program is an automaton that detects changes in its environment and effects changes in it. As such, it belongs in the same class of machines as biological nervous systems and integrated circuits. A basic universal behaving machine (UBM) consists, on the one hand, of a couple of elementary behaving entities (a sensor and an effector) or actors and, on the other, of an environment (a variable).

Universal Behaving Machine

Actors

Environment

Sensor Effector Variable

More complex UBMs consist of arbitrarily large numbers of actors and environmental variables. This computing model, which I have dubbed the behavioral computing model (BCM), is a radical departure from the TCM. Whereas a UTM is primarily a calculation tool for solving algorithmic problems, a UBM is simply an agent that reacts to one or more environmental stimuli. As seen in the figure below, in order for a UBM to act on and react to its environment, sensors and effectors must be able to communicate with each other.

The main point of this argument is that, even though communication is an essential part of the nature of computing, this is not readily apparent from examining a UTM. Indeed, there are no signaling entities, no signals and no signal pathways on a Turing tape or in computer memory. The reason is that, unlike hardware objects which are directly observable, software entities are virtual and must be logically inferred.

Fateful Choice

Unfortunately for the world, it did not occur to early computer scientists that a program is, at its core, a tightly integrated collection of communicating entities interacting with each other and with their environment. As a result, the computer industry had no choice but to embrace a method of software construction that sees the computer simply as a tool for the execution of instruction sequences. The problem with this approach is that it forces the programmer to explicitly identify and resolve a number of critical communication-related issues that, ideally, should have been implicitly and automatically handled at the system level. The TCM is now so ingrained in the collective mind of the software engineering community that most programmers do not even recognize these issues as having anything to do with either communication or behavior. This would not be such a bad thing except that a programmer cannot possibly be relied upon to resolve all the dependencies of a complex software application during a normal development cycle. Worse, given the inherently messy nature of algorithmic software, there is no guarantee that they can be completely resolved. This is true even if one had an unlimited amount of time to work on it. The end result is that software applications become less predictable and less stable as their complexity increases.

To Model or Not to Model

It can be convincingly argued that the UBM described above should have been adopted as the proper basis of software engineering from the very beginning of the modern computer era. Note that, whereas a UBM can directly model a UTM, a UTM can only simulate a UBM (using an infinite loop). The reason is that a UBM is synchronous (**) by nature, that is to say, more than two of its constituent objects can communicate simultaneously.

In a UTM, by contrast, only two objects can communicate at a time: a predecessor and a successor.

The question is, is an emulation of a parallel synchronous system adequate for the purpose of resolving the communication issues mentioned in the previous paragraph? As explained below, the answer is a resounding yes. That is, as long as the processor is fast enough.

Turing's Monster

It is tempting to speculate that, had it not been for our early infatuation with the sanctity of the TCM, we might not be in the sorry mess that we are in today. Software engineers have had to deal with defective software from the very beginning. Computer time was expensive and, as was the practice in the early days, a programmer had to reserve access to a computer days and sometimes weeks in advance. So programmers found themselves spending countless hours meticulously scrutinizing program listings in search of bugs. By the mid 1970s, as software systems grew in complexity and applicability, people in the business began to talk of a reliability crisis. Innovations such as high-level languages, structured and/or object-oriented programming did little to solve the reliability problem. Turing's baby had quickly grown into a monster.

Vested Interest

Software reliability experts (such as the folks at Cigital) have a vested interest in seeing that the crisis lasts as long as possible. It is their raison d'tre. Computer scientists and software engineers love Dr. Brooks' ideas because an insoluble software crisis affords them with a well-paying job and a lifetime career as reliability engineers. Not that these folks do not bring worthwhile advances to the table. They do. But looking for a breakthrough solution that will produce Brooks' order-of-magnitude improvement in reliability and productivity is not on their agenda. They adamantly deny that such a breakthrough is even possible. Brooks' paper is their new testament and 'no silver bullet' their mantra. Worst of all, most of them are sincere in their convictions.

This attitude (pathological denial) has the unfortunate effect of prolonging the crisis. Most of the burden of ensuring the reliability of software is now resting squarely on the programmer's shoulders. An entire reliability industry has sprouted with countless experts and tool vendors touting various labor-intensive engineering recipes, theories and practices. But more than thirty years after people began to refer to the problem as a crisis, it is worse than ever. As the Technology Review article points out, the cost has been staggering.

There Is a Silver Bullet After All

Reliability is best understood in terms of complexity vs. defects. A program consisting of one thousand lines of code is generally more complex and less reliable than a one with a hundred lines of code. Due to its sheer astronomical complexity, the human brain is the most reliable behaving system in the world. Its reliability is many orders of magnitude greater than that of any complex program in existence (see devil's advocate). Any software application with the complexity of the brain would be so riddled with bugs as to be unusable. Conversely, given their low relative complexity, any software application with the reliability of the brain would almost never fail. Imagine how complex it is to be able to recognize someone's face under all sorts of lighting conditions, velocities and orientations. Just driving a car around town (taxi drivers do it all day long, everyday) without getting lost or into an accident is incredibly more complex than anything any software program in existence can accomplish. Sure brains make mistakes, but the things that they do are so complex, especially the myriads of little things that we are oblivious to, that the mistakes pale in comparison to the successes. And when they do make mistakes, it is usually due to physical reasons (e.g., sickness, intoxication, injuries, genetic defects, etc...) or to external circumstances beyond their control (e.g., they did not know). Mistakes are rarely the result of defects in the brain's existing software.

The brain is proof that the reliability of a behaving system (which is what a computer program is) does not have to be inversely proportional to its complexity, as is the case with current software systems. In fact, the more complex the brain gets (as it learns), the more reliable it becomes. But the brain is not the only proof that we have of the existence of a silver bullet. We all know of the amazing reliability of integrated circuits. No one can seriously deny that a modern CPU is a very complex device, what with some of the high-end chips from Intel, AMD and others sporting hundreds of millions of transistors. Yet, in all the years that I have owned and used computers, only once did a CPU fail on me and it was because its cooling fan stopped working. This seems to be the norm with integrated circuits in general: when they fail, it is almost always due to a physical fault and almost never to a defect in the logic. Moore's law does not seem to have had a deleterious effect on hardware reliability since, to my knowledge, the reliability of CPUs and other large scale integrated circuits did not degrade over the years as they increased in speed and complexity.

Deconstructing Brooks' Complexity Arguments

Frederick Brooks' arguments fall apart in one important area. Although Brooks' conclusion is correct as far as the unreliability of complex algorithmic software is concerned, it is correct for the wrong reason. I argue that software programs are unreliable not because they are complex (Brooks' conclusion), but because they are algorithmic in nature. In his paper, Brooks defines two types of complexity, essential and accidental. He writes:

The complexity of software is an essential property, not an accidental one.

According to Brooks, one can control the accidental complexity of software engineering (with the help of compilers, syntax and buffer overflow checkers, data typing, etc...), but one can do nothing about its essential complexity. Brooks then explains why he thinks this essential complexity leads to unreliability:

From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program, and from that comes the unreliability.

This immediately begs several questions: Why must the essential complexity of software automatically lead to unreliability? Why is this not also true of the essential complexity of other types of behaving systems? In other words, is the complexity of a brain or an integrated circuit any less essential than that of a software program? Brooks is mum on these questions even though he acknowledges in the same paper that the reliability and productivity problem has already been solved in hardware through large-scale integration.

More importantly, notice the specific claim that Brooks is making. He asserts that the unreliability of a program comes from the difficulty of enumerating and/or understanding all the possible states of the program. This is an often repeated claim in the software engineering community but it is fallacious nonetheless. It overlooks the fact that it is equally difficult to enumerate all the possible states of a complex hardware system. This is especially true if one considers that most such systems consist of many integrated circuits that interact with one another in very complex ways. Yet, in spite of this difficulty, hardware systems are orders of magnitude more robust than software systems (see the COSA Reliability Principle for more on this subject).

Brooks backs up his assertion with neither logic nor evidence. But even more disturbing, nobody in the ensuing years has bothered to challenge the validity of the claim. Rather, Brooks has been elevated to the status of a demigod in the software engineering community and his ideas on the causes of software unreliability are now bandied about as infallible dogma.

Targeting the Wrong Complexity

Obviously, whether essential or accidental, complexity is not, in and of itself, conducive to unreliability. There is something inherent in the nature of our software that makes it prone to failure, something that has nothing to do with complexity per se. Note that, when Brooks speaks of software, he has a particular type of software in mind:

The essence of a software entity is a construct of interlocking concepts: data sets, relationships among data items, algorithms, and invocations of functions.

By software, Brooks specifically means algorithmic software, the type of software which is coded in every computer in existence. Just like Alan Turing before him, Brooks fails to see past the algorithmic model. He fails to realize that the unreliability of software comes from not understanding the true nature of computing. It has nothing to do with the difficulty of enumerating all the states of a program. In the remainder of this article, I will argue that all the effort in time and money being spent on making software more reliable is being targeted at the wrong complexity, that of algorithmic software. And it is a particularly insidious and intractable form of complexity, one which humanity, fortunately, does not have to live with. Switch to the right complexity and the problem will disappear.

The Billion Dollar Question

The billion (trillion?) dollar question is: What is it about the brain and integrated circuits that makes them so much more reliable in spite of their essential complexity? But even more important, can we emulate it in our software? If the answer is yes, then we have found the silver bullet.

The Silver Bullet

Why Software Is Bad

Algorithmic software is unreliable because of the following reasons:

Brittleness
  An algorithm is not unlike a chain. Break a link and the entire chain is broken. As a result, algorithmic programs tend to suffer from catastrophic failures even in situations where the actual defect is minor and globally insignificant.
Temporal Inconsistency
  With algorithmic software it is virtually impossible to guarantee the timing of various processes because the execution times of subroutines vary unpredictably. They vary mainly because of a construct called conditional branching, a necessary decision mechanism used in instruction sequences. But that is not all. While a subroutine is being executed, the calling program goes into a coma. The use of threads and message passing between threads does somewhat alleviate the problem but the multithreading solution is way too coarse and unwieldy to make a difference in highly complex applications. And besides, a thread is just another algorithm. The inherent temporal uncertainty (from the point of view of the programmer) of algorithmic systems leads to program decisions happening at the wrong time, under the wrong conditions.
Unresolved Dependencies
  The biggest contributing factor to unreliability in software has to do with unresolved dependencies. In an algorithmic system, the enforcement of relationships among data items (part of what Brooks defines as the essence of software) is solely the responsibility of the programmer. That is to say, every time a property is changed by a statement or a subroutine, it is up to the programmer to remember to update every other part of the program that is potentially affected by the change. The problem is that relationships can be so numerous and complex that programmers often fail to resolve them all.

Why Hardware is Good

Brains and integrated circuits are, by contrast, parallel signal-based systems. Their reliability is due primarily to three reasons:

Strict Enforcement of Signal Timing through Synchronization
  Neurons fire at the right time, under the right temporal conditions. Timing is consistent because of the brain's synchronous architecture (**). A similar argument can be made with regard to integrated circuits.
Distributed Concurrent Architecture
  Since every element runs independently and synchronously, the localized malfunctions of a few (or even many) elements will not cause the catastrophic failure of the entire system.
Automatic Resolution of Event Dependencies
  A signal-based synchronous system makes it possible to automatically resolve event dependencies. That is to say, every change in a system's variable is immediately and automatically communicated to every object that depends on it.

Programs as Communication Systems

Although we are not accustomed to think of it as such, a computer program is, in reality, a communication system. During execution, every statement or instruction in an algorithmic procedure essentially sends a signal to the next statement, saying: 'I'm done, now it's your turn.' A statement should be seen as an elementary object having a single input and a single output. It waits for an input signal, does something, and then sends an output signal to the next object. Multiple objects are linked together to form a one-dimensional (single path) sequential chain. The problem is that, in an algorithm, communication is limited to only two objects at a time, a sender and a receiver. Consequently, even though there may be forks (conditional branches) along the way, a signal may only take one path at a time.

My thesis is that this mechanism is too restrictive and leads to unreliable software. Why? Because there are occasions when a particular event or action must be communicated to several objects simultaneously. This is known as an event dependency. Algorithmic development environments make it hard to attach orthogonal signaling branches to a sequential thread and therein lies the problem. The burden is on the programmer to remember to add code to handle delayed reaction cases: something that occurred previously in the procedure needs to be addressed at the earliest opportunity by another part of the program. Every so often we either forget to add the necessary code (usually, a call to a subroutine) or we fail to spot the dependency.

Event Dependencies and the Blind Code Problem

The state of a system at any given time is defined by the collection of properties (variables) that comprise the system's data, including the data contained in input/output registers. The relationships or dependencies between properties determine the system's behavior. A dependency simply means that a change in one property (also known as an event) must be followed by a change in one or more related properties. In order to ensure flawless and consistent behavior, it is imperative that all dependencies are resolved during development and are processed in a timely manner during execution. It takes intimate knowledge of an algorithmic program to identify and remember all the dependencies. Due to the large turnover in the software industry, programmers often inherit strange legacy code which aggravates the problem. Still, even good familiarity is not a guarantee that all dependencies will be spotted and correctly handled. Oftentimes, a program is so big and complex that its original authors completely lose sight of old dependencies. Blind code leads to wrong assumptions which often result in unexpected and catastrophic failures. The problem is so pervasive and so hard to fix that most managers in charge of maintaining complex mission-critical software systems will try to find alternative ways around a bug that do not involve modifying the existing code.

The Cure For Blind Code

To cure code blindness, all objects in a program must, in a sense, have eyes in the back of their heads. What this means is that every event (a change in a data variable) occurring anywhere in the program must be detected and promptly communicated to every object that depends on it. The cure consists of three remedies, as follows:

Automatic Resolution of Event Dependencies
  The problem of unresolved dependencies can be easily solved in a change-driven system through the use of a technique called dynamic pairing whereby change detectors (comparison sensors) are associated with related operators (effectors). This way, the development environment can automatically identify and resolve every dependency between sensors and effectors, leaving nothing to chance.
One-to-many Connectivity
  One of the factors contributing to blind code in algorithmic systems is the inability to attach one-to-many orthogonal branches to a thread. This problem is non-existent in a synchronous system because every signal can be channeled through as many pathways as necessary. As a result, every change to a property is immediately broadcasted to every object that is affected by the change.
Immediacy
  During the processing of any element in an algorithmic sequence, all the other elements in the sequence are disabled. Thus, any change or event that may require the immediate attention of either preceding or succeeding elements in the chain is ignored. Latency is a major problem in conventional programs. By contrast, immediacy is an inherent characteristic of synchronous systems. 

Software Design vs. Hardware Design

All the good things that are implicit and taken for granted in hardware logic design become explicit and a constant headache for the algorithmic software designer. The blindness problem that afflicts conventional software simply does not exist in electronic circuits. The reason is that hardware is inherently synchronous. This makes it easy to add orthogonal branches to a circuit. Signals are thus promptly dispatched to every element or object that depends on them. Furthermore, whereas sensors (comparison operators) in software must be explicitly associated with relevant effectors and invoked at the right time, hardware sensors are self-processing. That is, a hardware sensor works independently of the causes of the phenomenon (change) it is designed to detect. As a result, barring a physical failure, it is impossible for a hardware system to fail to notice an event.

By contrast, in software, sensors must be explicitly processed in order for a change to be detected. The result of a comparison operation is likely to be useless unless the operator is called at the right time, i.e., immediately after or concurrent with the change. As mentioned previously, in a complex software system, programmers often fail to update all relevant sensors after a change in a property. Is it any wonder that logic circuits are so much more reliable than software programs?

As Jiantao Pan points out in his excellent paper on software reliability, "hardware faults are mostly physical faults, while software faults are design faults, which are harder to visualize, classify, detect, and correct." This begs the question. Why can't software engineers do what hardware designers do? In other words, why can't software designers design software the same way hardware designers design hardware? (Note that, by hardware design, I mean the design of the hardware's logic). When hardware fails, it is almost always due to some physical malfunction, and almost never to a problem in the underlying logic. Since software has no physical faults and only design faults, by adopting the synchronous reactive model of hardware logic design, we can bring software reliability to at least a level on a par with that of hardware. Fortunately for software engineering, all the advantages of hardware can also be made intrinsic to software. And it can be done in a manner that is completely transparent to the programmer.

Thinking of Everything

When it comes to safety-critical applications such as air traffic control or avionics software systems, even a single defect is not an option since it is potentially catastrophic. Unless we can guarantee that our programs are logically consistent and completely free of defects, the reliability problem will not go away. In other words, extremely reliable software is just not good enough. What we need is 100% reliable software. There is no getting around this fact. 

Jeff Voas, a leading proponent of the 'there is no silver bullet' movement and a co-founder of Cigital, a software-reliability consulting firm in Dulles, VA, once said that "it's the things that you never thought of that get you every time." It is true that one cannot think of everything, especially when working with algorithmic systems. However, it is also true that a signal-based, synchronous program can be put together in such a way that all internal dependencies and incompatibilities are spotted and resolved automatically, thus relieving the programmer of the responsibility to think of them all. In addition, since all conditions to which the program is designed to react are explicit, they can all can be tested automatically before deployment. Guaranteed bug-free software is an essential aspect of the COSA Project and the COSA operating system. Refer to the COSA Reliability Principle for more on this topic.

Addendum (3/5/2006) The COSA software model makes it possible to automatically find design inconsistencies in a complex program based on temporal constraints. There is a simple method that will ensure that a complex software system is free of internal logical contradictions. With this method, it is possible to increase design correctness simply by increasing complexity. The consistency mechanism can find all temporal constraints in a complex program automatically, while the program is running. The application designer is given the final say as to whether or not any discovered constraint is retained.

Normally, logical consistency is inversely proportional to complexity. The COSA software model introduces the rather counterintuitive notion that higher complexity is conducive to greater consistency. The reason is that both complexity and consistency increase with the number of constraints without necessarily adding to the system's functionality. Any new functionality will be forced to be compatible with the existing constraints while adding new constraints of its own, thereby increasing design correctness and application robustness. Consequently, there is no limit to how complex our future software systems will be. Eventually, time permitting, I will add a special page to the site to explain the constraint discovery mechanism, as it is a crucial part of the COSA model.

 

Plug-Compatible Components

Many have suggested that we should componentize computer programs in the hope of doing for software what integrated circuits did for hardware. Indeed, componentization is a giant step in the right direction but, even though the use of software components (e.g., Microsoft's ActiveX controls, Java beans, C++ objects, etc...) in the last decade has automated much of the pain out of programming, the reliability problem is still with us. The reason should be obvious: software components are constructed with things that are utterly alien to a hardware IC designer: algorithms. Also a thoroughly tested algorithmic component may work fine in one application but fail in another. The reason is that its temporal behavior is not consistent. It varies from one environment to another. This problem does not exist in a synchronous model making it ideal as a platform for components.

Another known reason for bad software has to do with compatibility. In the brain, signal pathways are not connected willy-nilly. Connections are made according to their types. Refer, for example, to the retinotopic mapping of the visual cortex: signals from a retinal ganglion cell ultimately reach a specific neuron in the visual cortex, all the way in the back of the brain. This is accomplished via a biochemical identification mechanism during the brain's early development. It is a way of enforcing compatibility between connected parts of the brain. We should follow nature's example and use a strict typing mechanism in our software in order to ensure compatibility between communicating objects. All message connectors should have unique message types, and all connectors should be unidirectional, i.e., they should be either male (sender) or female (receiver). This will eliminate mix-ups and ensure robust connectivity. The use of libraries of pre-built components will automate over 90% of the software development process and turn everyday users into software developers. These plug-compatible components should snap together automatically: just click, drag and drop. Thus the burden of assuring compatibility is the responsibility of the development system, not the programmer.

Some may say that typed connectors are not new and they are correct. Objects that communicate via connectors have indeed been tried before, and with very good results. However, as mentioned earlier, in a pure signal-based system, objects do not contain algorithms. Calling a function in a C++ object is not the same as sending a typed signal to a synchronous component. The only native (directly executable) algorithmic code that should exist in the entire system is a small microkernel. No new algorithmic code should be allowed since the microkernel runs everything. Furthermore, the underlying parallelism and the signaling mechanism should be implemented and enforced at the operating system level in such a way as to be completely transparent to the software designer. (Again, see the COSA Operating System for more details on this topic).

Event Ordering Is Critical

Consistent timing is vital to reliability but the use of algorithms plays havoc with event ordering. To ensure consistency, the prescribed scheduling of every operation or action in a software application must be maintained throughout the life of the application, regardless of the host environment. Nothing should be allowed to happen before or after its time. In a signal-based, synchronous software development environment, the enforcement of order must be deterministic in the sense that every reaction must be triggered by precise, predetermined and explicit conditions. Luckily, this is not something that developers need to be concerned with because it is a natural consequence of the system's parallelism. Note that the term 'consistent timing' does not mean that operations must be synchronized to a real time clock (although they may). It means that the prescribed logical or relative order of operations must be enforced automatically and maintained throughout the life of the system.

Von Neumann Architecture

The astute reader may point out that the synchronous nature of hardware cannot be truly duplicated in software because the latter is inherently sequential due to the von Neumann architecture of our computers. This is true but, thanks to the high speed of modern processors, we can easily emulate (although not truly simulate) the parallelism of integrated circuits in software. This is not new. We already emulate nature's parallelism in our artificial neural networks, cellular automata, computer spreadsheets, video games and other types of applications consisting of large numbers of entities running concurrently. The technique is simple: Essentially, within any given processing cycle or frame interval, a single fast central processor does the work of many small virtual processors residing in memory.

One may further argue that in an emulated parallel system, the algorithms are still there even if they are not visible to the developer, and that therefore, the unreliability of algorithmic software cannot be avoided. This would be true if unreliability were due to the use of a single algorithm or even a handful of them. This is neither what is observed in practice nor what is being claimed in this article. It is certainly possible to create one or more flawless algorithmic procedures. We do it all the time. The unreliability comes from the unbridled proliferation of procedures, the unpredictability of their interactions, and the lack of a surefire method with which to manage and enforce dependencies (see the blind code discussion above).

As mentioned previously, in a synchronous software system, no new algorithmic code is ever allowed. The only pure algorithm in the entire system is a small, highly optimized and thoroughly tested execution kernel which is responsible for emulating the system's parallelism. The strict prohibition against the deployment of new algorithmic code effectively guarantees that the system will remain stable.

Software ICs with a Twist

In a 1995 article titled "What if there's a Silver Bullet..." Dr. Brad Cox wrote the following:

Building applications (rack-level modules) solely with tightly-coupled technologies like subroutine libraries (block-level modules) is logically equivalent to wafer-scale integration, something that hardware engineering can barely accomplish to this day. So seven years ago, Stepstone began to play a role analogous to the silicon chip vendors, providing chip-level software components, or Software-ICs[TM], to the system-building community.

While I agree with the use of modules for software composition, I take issue with Dr. Cox's analogy, primarily because subroutine libraries have no analog in integrated circuit design. The biggest difference between hardware and conventional software is that the former operates in a synchronous, signal-based universe where timing is systematic and consistent, whereas the latter uses algorithmic procedures which result in haphazard timing.

Achieving true logical equivalence between software and hardware necessitates a signal-based, synchronous software model. In other words, software should not be radically different than hardware. Rather, it should serve as an extension to it. It should emulate the functionality of hardware by adding only what is lacking: flexibility and ease of modification. In the future, when we develop technologies for non-von Neumann computers that can sprout new physical signal pathways and new self-processing objects on the fly, the operational distinction between software and hardware will no longer be valid.

As an aside, it is my hope that the major IC manufacturers (Intel, AMD, Motorola, Texas Instruments, Sun Microsystems, etc...) will soon recognize the importance of synchronous software objects and produce highly optimized CPUs designed specifically for this sort of parallelism. This way, the entire execution kernel could be made to reside on the CPU chip. This would not only completely eliminate the need for algorithmic code in program memory but would result in unparalleled performance. See the description of the COSA Operating System Kernel for more on this.

Failure Localization

An algorithmic program is more like a chain, and like a chain, it is as strong as its weakest link. Break any link and the entire chain is broken. This brittleness can be somewhat alleviated by the use of multiple parallel threads. A malfunctioning thread usually does not affect the proper functioning of the other threads. Failure localization is a very effective way to increase a system's fault tolerance. But the sad reality is that, even though threaded operating systems are the norm in the software industry, our systems are still susceptible to catastrophic failures. Why? The answer is that threads do not entirely eliminate algorithmic coding. They encapsulate algorithms into concurrent programs running on the same computer. Another even more serious problem with threads is that they are, by necessity, asynchronous. Synchronous processing (in which all elementary operations have equal durations and are synchronized to a common clock) is a must for reliability.

Threads can carry a heavy price because of the performance overhead associated with context switching. Increasing the number of threads in a system so a to encapsulate and parallelize elementary operations quickly becomes unworkable. The performance hit would be tremendous. Fortunately, there is a simple parallelization technique that does away with threads altogether. It is commonly used in such applications as cellular automata, neural networks, and other simulation-type programs. See the COSA Operating System for more details.

Boosting Productivity

The notion that the computer is merely a machine for the execution of instruction sequences is a conceptual disaster. The computer should be seen as a behaving system, i.e., a collection of synchronously interacting objects. The adoption of a synchronous model will improve productivity by several orders of magnitude for the following reasons:

Visual Software Composition
  The synchronous model lends itself superbly to a graphical development environment for software composition. It is much easier to grasp the meaning of a few well-defined icons than it is to decipher dozens of keywords in a language which may not even be one's own. It takes less mental effort to follow signal activation pathways on a diagram than it is to unravel someone's obscure algorithmic code spread over multiple files. The application designer can get a better feel for the flow of things because every signal propagates from one object to another using a unidirectional pathway. A drag-and-drop visual composition environment not only automates a large part of software development, it also eliminates the usual chaos of textual environments by effectively hiding away any information that lies below the current level of abstraction. For more information, see Software Composition in COSA.
Complementarity
  One of the greatest impediments to software productivity is the intrinsic messiness of algorithmic software. Although the adoption of structured code and object-oriented programming in the last century was a significant improvement, one could never quite achieve a true sense of order and completeness. There is a secure satisfaction one gets from a finished puzzle in which every element fits perfectly. This sort of order is a natural consequence of what I call the principle of complementarity. Nothing brings order into chaos like complementarity. Fortunately, the synchronous model is an ideal environment for an organizational approach which is strictly based on complementarity. Indeed, complementarity is the most important of the basic principles underlying Project COSA.
Fewer Bugs
  The above gains will be due to a marked increase in clarity and comprehensibility. But what will drastically boost productivity will be the fewer number of bugs to fix. It is common knowledge that the average programmer's development time is spent mostly in testing and debugging. The use of snap-together components (click, drag and drop) will automate a huge part of the development process while preventing and eliminating all sorts of problems associated with incompatible components. In addition, development environments will contain debugging tools that will find, correct and prevent all the internal design bugs automatically. A signal-based, synchronous environment will facilitate safe, automated software development and will open up computer programming to the lay public.
 

Conclusion

Slaying the Werewolf

Unreliable software is the most urgent issue facing the computer industry. Reliable software is critical to the safety, security and prosperity of the modern computerized world. Software has become too much a part of our everyday lives to be entrusted to the vagaries of an archaic and hopelessly flawed paradigm. We need a new approach based on a rock-solid foundation, an approach worthy of the twenty-first century. And we need it desperately! We simply cannot afford to continue doing business as usual. Frederick Brooks is right about one thing: there is indeed no silver bullet that can solve the reliability problem of complex algorithmic systems. But what Brooks and others fail to consider is that his arguments apply only to the complexity of algorithmic software, not to that of behaving systems in general. In other words, the werewolf is not complexity per se but algorithmic software. The bullet should be used to slay the beast once and for all, not to alleviate the symptoms of its incurable illness.

Rotten at the Core

In conclusion, we can solve the software reliability and productivity crisis. To do so, we must acknowledge that there is something rotten at the core of software engineering. We must understand that using the algorithm as the basis of computer programming is the last of the stumbling blocks that are preventing us from achieving an effective and safe componentization of software comparable to what has been done in hardware. It is the reason that current quality control measures will always fail in the end. To solve the crisis, we must adopt a synchronous, signal-based software model. Only then will our software programs be guaranteed free of defects, irrespective of their complexity.

Next: Project COSA

 

* This is not to say that algorithmic solutions are bad or that they should not be used, but that the algorithm should not be the basis of software construction. A purely algorithmic procedure is one in which communication is restricted to only two elements or statements at a time. In a non-algorithmic system, the number of elements that can communicate simultaneously is only limited by physical factors.
 

** A synchronous system is one in which all objects are active at the same time. This does not mean that all signals must be generated simultaneously. It means that every object reacts to its related events immediately, i.e., without delay. The end result is that the timing of reactions is deterministic.

 

2004-2006 Louis Savain

Copy and distribute freely