Writings on civilization's systems and their limits

Tuesday 22 February 2011

Introduction

G'day, my name's Andrew. I'm a 34 year old digital signal processing (DSP) engineer, distance runner, environmentalist, skiier and bike nut. My job involves writing software for sound processing on DSP chips - mobile phones, bluetooth headsets and hearing aids all use audio DSP processing. I earned an engineering/science undergraduate degree from Melbourne Uni in 1999, with physics and mathematics for the science component. This was followed by a two and a half year stint of structural and acoustic engineering in an engineering consultancy, followed by three years of grant administration with State Government, before joining another engineering consultancy for another year at the coal face, this time doing mechanical services and acoustics work. At that point I decided that something a bit more technical might be my thing. So I enrolled in a Masters of electronic engineering at RMIT, which saw me into my present job, where I've been for three years and which suits me very nicely!

I've read quite a few of the Australian economic bloggers (Unconventional Economist, Delusional Economics, Observations of an Economist Environmentalist, Houses and Holes, Tasmanian Real Estate Trouble, Critical Influence, Billy Blog), in addition to a few overseas ones (Oil Drum, Automatic Earth and Energy Bulletin), and many books on systems, finance and energy. I've also done a fair stack of reading recently (on and off line) in an effort to understand the actual workings of the Australian monetary system.

In November 2010, a coalition of some of the abovementioned blogs (UE, DE, H&H plus the Australia Institute) launched an essay competition, named Son of Wallis, to invite people to submit essays on debt, securitisation, competition and stability in the Australian banking sector. The competition name was a reference to an earlier Australian financial system inquiry (the Wallis Report), which was presented to the Treasurer in March 1997. The reuse of this name was to make the point that further reform of the Australian banking system is needed, and that new ideas are also needed. I wrote a submission to this competition.

At about the same time, I also made a separate submission to the Senate Inquiry into Competition within the Australian banking sector. 111 submissions were received, and many of them make interesting reading. My own is submission #65.

Some time last year (2010) I had a few ideas buzzing around in the back of my head, and these led me to reserve this blog page. I had a few other things competing for my time, so, although the ideas multiplied and were jotted down in various places, I didn't quite get to the point of sitting down and pulling it all together and getting it started. However, the act of writing these two submissions, combined with a week of running, reading and writing in one of my favourite places - Falls Creek - over the end of year break led me to crank things up and start writing!

There are quite a few economists and finance people already writing excellent blogs - as an engineer with mathematical training and an interest in systems theory and behaviour, I'm hoping that Systems & Limits will be a logical addition to the existing blogs, rather than present as a competitor.


System - a definition

The word "system" has several meanings. The one I have in mind is this: "complex whole, set of connected things or parts, organized body of material or immaterial things" (Concise Oxford). Our civilisation is made up of many systems - physical structures, health care, finance, education, social welfare, transportation, technology, political, justice, agriculture, energy production, energy generation and administrative systems are just a few examples.

Systems contain components, which respond to their inputs by modifying their internal states and generating outputs, according to specific sets of rules. System components can be anything - from steel beams in skyscrapers, to mobile phones in mobile phone networks, to individual account holders in a banking system, to shops in a food distribution system. Systems also have connections between components, by means of which the state of one component can affect the state of another component. These connections can include loads and forces in civil structures, signals in mobile phone networks, money and credit flows in a banking system, or food supply flows in a food distribution system.

I believe that although the type and scale of systems can vary widely, systems in general share many characteristics and attributes, and that as a result, principles and concepts can be developed to aid the study of any given system.

All systems exist within our natural environment and are a subset of our environment. They cannot continue to exist without this sustaining supersystem. The idea that the environmental is a subset of the economic system is prevalent (and often implicit without being explicitly stated) but wrong. As the enormous graffitied slogan on the chimney of the Spencer St power station in the Melbourne CBD used to say - "No Jobs on a Dead Planet"! Unfortunately, the power station, chimney and prescient message were demolished in late 2007.

The common feature of all human-created systems is that they are created for a specific purpose - what I call a system service. In contrast, natural systems are not purpose driven - they are just the way they are! The purpose-driven nature of human-created systems allows us to define system failure as the failure of a system to achieve the purpose for which it was created.

In studying systems, it is useful to create a conceptual model of a system in order to investigate and explain behaviour. The objective in creating a system model is to develop a model which is a simplification of the real system, but still a reasonable representation of system behaviour. System models can be verbal descriptions, scale models, mathematical models or PC simulations, amongst other forms.

Systems can be represented recursively - as systems of systems of systems, much like Russian dolls. However, this can lead to unnecessarily complex system models. When modelling an arbitrary system, the level of detail in the system model should be selected appropriately for the system behaviour under investigation. For instance, when looking at data flows in a PC network and network capacities, PCs are sensible element choices for the system model, while the network links can be represented by connections between PC elements. As this model includes individual data sources and destinations, as well as network structure, it has sufficient resolution to determine data volumes in the network links. When looking at individual PC behaviour when infected by a virus transmitted over the network, it might be more appropriate to represent the infected PC as a sub-system, where the complex design of the operating system is represented in a fashion that allows the interactions betweeen the virus and the operating system to be represented.

Whenever investigating some aspect of the behaviour of a system, the representative model for a system must have component scales selected for relevance to the system behaviour under consideration. I propose to call this the system representation scale (SRS) of the system for the system behaviour under investigation. The SRS for an arbitrary problem may fall anywhere between multi-billion star galaxies in galactic super clusters when considering the Hubble Constant; down to quarks, leptons and bosons (subatomic particles making up atoms) when considering the output of high energy collisions at the Large Hadron Collider in Switzerland. Virtually all system behaviours for any systems of possible human interest will have SRSs that fall somewhere between these two extremes.


Engineered structures - the Upper and Lower Bound theorems

My introduction to systems study was during my undergraduate structural engineering studies. Our primary concern was identifying the induced forces in loaded structures, and then later determining the strength limits of structures, or designing structures to withstand specified loads. On one level, structural systems are quite simple, and simple structural analyses can be performed. On another level, however, we do not have a perfect understanding of materials, or of structures, and thus our structural analyses will only ever be approximations to the reality of the built structure in use. In addition, many structures are indeterminate for applied loads, which means there is more than one possible load path and thus the structural analysis becomes more difficult, as it is difficult to identify which load paths take which proportion of the applied load.

This does not mean that it is not possible to design structures safely - quite the contrary! The reason for this is that we can make conservative assumptions about the properties of materials and about the loads to be imposed - and combine these with the upper and lower bound theorems to derive a safe structural design. I believe some of the ideas contained in these theorems have wider validity to other non-structural sytems, so I'll describe them in a little more depth.

The upper bound theorem can be stated as follows: "A collapse load computed on the basis of an assumed mechanism will always be greater than or equal to the true collapse load" (p 121, Ductile Design of Steel Structures).

This can be restated in more general terms, as follows:

If you assume a way in which a structure can fail (ie, specify points where the structure will fail, or "break" in some fashion), it is possible to calculate, using the structure geometry and assumed structural element capacities, the amount of imposed force required to make that structure fail *in that failure mode*. For the loaded structure in the figure below, two possible failure modes are shown - A and B. There is an infinite number of geometrically possible failure modes, although only one will actually occur when the structure is loaded.

For any given failure mode, the corresponding failure load is determined by the structure geometry and structural element capacities - this is relatively easy to calculate. The upper bound theorem states that the failure load of each of these failure modes will always be greater than or equal to the actual failure load of the structure for that specific load. The result is that the upper bound theorem allows us to calculate upper bounds on the strength of a structure, even if we don't know the specific failure mode of the structure.

In the figure below, the required force to induce failure will differ between failure mode A and failure mode B. Of these two failure modes, both will be upper bounds to the actual failure load of the structure, but the lower failure load will be a better approximation.



The lower bound theorem can be stated as follows: "A collapse load computed on the basis of an assumed moment diagram in which the moments are nowhere greater than Mp is less than or equal to the true collapse load" (p 121, Ductile Design of Steel Structures).

A common language translation of the Lower Bound Theorem is this: If you assume a load path for an imposed load in a structure, and calculate the maximum load that you can apply *for this load path* without overloading any component of the structure in the load path beyond the strength of that component, then this maximum load will be a lower bound on the load required to induce collapse of the structure - ie, the real structure will definitely be able to withstand this lower-bound load, although we won't know how much higher the actual failure load will be.



For a coarse, back-of-envelope analysis of a given structure subject to an imposed load, upper and lower bounds can easily be derived, and the structure selected to ensure that the lower bound is above the imposed load to be designed for. However, there will be a gap between the upper and lower bounds. For a large, expensive structure, such as a bridge or transmission tower, a simple structural analysis would result in a structure that was much stronger (and more expensive) than it needed to be, in order to ensure that the lower bound was above the design load of the structure. If the accuracy of the structural analysis was increased, the upper and lower bound estimates would come closer together and reduce the degree of uncertainty in the strength of the structure. Combined with an iterative design approach, this would enable the identification of a design that minimised construction material volume and costs, while still being strong enough to withstand a specified design load.

The fact that our understanding of materials is limited, and that our complex structural analyses are based on approximations, means that even increasingly sophisticated upper and lower bound calculations will almost never result in the same limit load - but they will still give us sufficient certainty to design a structure that is reasonably economical to construct, with a reasonable margin of safety built into the design, giving confidence in the structure's acceptability.


Limits of static analysis

An important point is that the upper and lower bound theorems are usually only applied to static structures. Once real structures are loaded past their actual strength limits, they will begin to fail. It may be possible to identify the first point of failure, but then the subsequent modes of failure may become increasingly difficult to identify, as dynamic effects (momentum and elastic energy) come into play, and the load paths in the structure change in an increasingly chaotic fashion as more and more elements fail.

The primary objective of structural engineers is to ensure that a structure never fails!, and this is done by focusing on static structural analysis. This is valid, since a static structure undergoing dynamic motion has already failed, by definition. Investigating actual modes of failure is largely left as an academic question, or as an investigative exercise when something really does fail and the cause of failure needs to be determined, so as to prevent recurrence in similar structures elsewhere. The Royal Commission into the collapse of span 10-11 of Melbourne's Westgate Bridge on 15 October 1970, killing 35 men, is an excellent example of this. The purpose of the Royal Commission was to determine the mode, and ultimate cause, of the fatal collapse, and to review whether aspects of the structural design may have been undesirable. Although not available online, I have a copy of this report, and may review it in a future blog.


Stress absorption and transmission

A crucial concept for understanding some of the ways in which systems can fail is the idea of transmitted stresses between system components. A system component subject to stress can either absorb that stress internally, relay that stress to connected components, or perform some combination of the two. As an example, consider the case of an individual who takes out a mortgage to purchase a house, with her parents putting their house up as a guarantee on the loan. Let's assume that there is an economic downturn, and that the individual loses her job at a technology startup. If she has some extra savings set aside, then she can react to the imposed financial stress of losing her job by dipping into her extra savings to continue to pay the mortgage while she looks for new employment. This would be an internally absorbed stress. If she has no savings set aside, however, and is forced by the bank to sell her house into a weak market for less than the value of the mortgage, then the bank could call in the mortgage guarantee and force her parents to come up with the cash to pay out the mortgage. This would be a transmitted stress, since the effect of losing her job has been for a financial obligation to be placed on her parents, by means of the mortgage and the mortgage guarantee.


Tough systems and brittle systems

Using the concept of stress absorption and stress transmission, I define a "tough" system as being one where imposed stresses are predominantly absorbed by system elements, or changes in component inputs result in changes in component states, but little change in component outputs. Eventually a tough system subject to increasing imposed stresses will fail, but there will be a process of internal stress absorption, followed by stress transmission between components, before system failure occurs. A defining characteristic of a tough system is that there is a significant increase in relative stress between the level at which system adaption begins to occur (stresses begin to be transmitted between elements) and the level at which complete failure of the system occurs. In the above example, the borrower having savings set aside would be an example of a tough system.

In a similar fashion, I define a "brittle" system as being one where imposed stresses result in stresses being transmitted between elements, rather than being absorbed within individual elements. Since stress transmission occurs readily, increased imposed stresses will quickly lead to changes in many components of the system, and hasten the complete failure of the system. A defining characteristic of a brittle system is that there is only a small increase in relative stress between the level at which stresses begin to be transferred between elements, and the level at which complete system failure occurs. In the above example, the borrower having no savings set aside to cope with unexpected financial problems would be an example of a brittle system, since an imposed financial stress would propagate readily.

A brittle system is not necessarily more prone to failure than a tough system, since failure of either system depends on the likelihood of a stress greater than the failure stress occurring, and this is dependent on the system environment. The difference is in the response of the two types of system to increasing stresses. Any increase in stress in a brittle system will lead the system towards failure more quickly, and there will be fewer warnings of impending system failure.


Efficiency

I'll define a system as being efficient if the specified level of system service cannot be provided for a lower cost, whether in terms of money, resources, people or capital. Since greater capacity in a system generally requires more money, resources, people or capital, an efficient system is likely to be made up of components operating close to their individual capacities, and thus an efficient system will be brittle.


Redundancy in systems

Returning to a high level perspective, our civilisational systems are often complex. When operating normally, they provide specific system services - such as transporting cars across a river as over Westgate Bridge, conducting financial transactions between individuals as in a financial system, conveying phone conversations between two mobile handsets as in telecommunications systems, permitting decisions to be made on public infrastructure as in our political system, and so on. Each of these systems is made of many subcomponents. Some components can fail without imperilling the system (redundant components), while other components are essential (essential components) and their failure will mean at least partial failure of the overall system.

Redundancy is a separate property to stress absorption/transmission, brittleness/toughness and efficiency - it is the sensitivity of a system to the failure of a specific component. It does not express the actual likelihood of that failure occurring. When discussing the effects of changes in inputs to a system, toughness and brittleness are much more meaningful characteristics for investigation. However, when considering factors that might cause the failure of a specific component, system redundancy in that component becomes relevant.

When components of our systems fail, then the redundancy of the overall system in those components can vary - from complete redundancy (other components will take over the role of the failed component, and the system will continue to operate with no loss of functionality) through partial redundancy (other components will take over some of the roles of the failed component, but not all - and the overall system will exhibit degraded performance) through to no redundancy (the overall system loses all functionality).

It is not necessarily a problem if a system has low redundancy in a particular component - what matters is the risk that a particular component will fail, combined with the redundancy of the system in that component. If a component has a high risk of failure, but the system is redundant in that component, then the risk posed to the system is relatively low. If the component has a high risk of failure and the system is not redundant in that component, then the risk posed to the system is high.

The concept of redundancy also provides a way to evaluating the risk posed to a system by something that is not explicitly considered in a system model, such as error. If a system has little or no redundancy in a component, it would be wise to investigate whether there is anything that might trigger the failure of that component, and to ensure that strategies are employed to reduce or eliminate that risk. This technique is particularly valuable if the system model used to assess system stability against imposed stresses does not include factors that pose a risk to non-redundant components.

The NAB batch payment failure on the night of 25 November 2010 illustrates these ideas. Although a comprehensive account of the causes of the failure is not available, the cause appears to have been that a corrupted batch payments file was submitted to their central database, and that errors caused by the corrupted file then propagated through the rest of the banking system. This resulted in the freezing of the accounts of firms, the non-payment of wages and the double charging of some account holders for transactions on their accounts. I think it is reasonable to describe this result as a partial failure of the Australian banking system, so the Australian banking system can be described as having no redundancy in NAB's batch payment file processing against partial failure.

NAB should have had a method of ensuring that either batch payment files submitted to their database were not corrupted, or of quarantining database outputs (and maintaining progressive, restorable backups of database status), until they had certainty that earlier inputs to the database were valid and the results of internal processing correct, whereupon onward processing to the Reserve Bank and to other banks could then be released from quarantine and sent. It would be of great interest to know more about the causes and chronology of this partial failure of the banking system.


Feedback loops

Feedback loops occur when there is a cyclic loop within a system. Feedback loops can be negative or positive - negative feedbacks serve to stabilise a system, while positive feedbacks serve to destabilise it. Feedback loops are common in many systems. As an example, in financial systems, asset prices are dependent on investor perceptions of value, investor perceptions of value are influenced by the observed behaviour of other investors, and the behaviour of other individuals are influenced by asset prices. Whether the feedback loop is positive or negative depends on the interactions between asset prices, perceptions and investor behaviour. Hearing aids are subject to "squealing", which is due to acoustic feedback, and the throbbing of poorly adjusted cantilever brakes on bikes is also due to a feedback effect.

Systems can be designed to adapt themselves so that positive feedbacks are controlled - this requires a system where the function of the system changes in response to the system's internal state. This is a complex subject, and a proper treatment would make this essay unacceptably long! However, an understanding of feedback processes is often essential to understanding system behaviour.


Consequences of failure

When considering the possibility of the failure of a system, the consequences of that failure are an important consideration. The degree of effort expended on preventing a system failure should be related to the consequences of the system failure. For a system like Facebook, the consequences of a system failure are unlikely to be particularly significant, beyond missed social engagements and lost contact information. For a system like a vaccine development program, the failure of the vaccine program may lead to a higher chance of death from the disease that the vaccine is intended to prevent, or to significant adverse effects on vaccinated individuals, resulting from unexpected side effects of the vaccine. For a system like an Airbus A380 or a skyscraper, failure may mean the deaths of large numbers of people. The amount of effort devoted to systems design should be proportional to the consequences of the failure of that system.


Conclusion

I will wrap up by making two points. First, a good understanding of the system being discussed is an essential prerequisite to meaningful and intelligent discussion of that system. If one does not understand the workings of a system, then one cannot have confidence in the value of one's commentary on that system.

Second, while it is valuable to be critical of human-created systems - if those criticisms are well founded - it is also important to make suggestions on how systems can be modified, either to reduce the possibility of failure, or to reduce the consequences of failure. In other words, it is good to criticise, but just as important to say how something can be improved.

My next few posts are intended to be financial, but I'll detour as interest dictates! Going by the volume of text file and notebook jottings, I'm not expecting to run out of ideas any time soon. I may not be as prolific a poster as some other financial commentators - but it is hoped that each post will be a considered and thought provoking essay.

I hope that you've all enjoyed reading this first Systems and Limits post - and that it brings you back for more!

Cheers

Andrew