Systems & Limits: 2011

Sunday 6 November 2011

Monetary systems - an introduction

Human beings require many goods and services for daily life - water, food, accommodation, clothing, healthcare and education are just a few examples. It might be theoretically possible for each individual to provide the goods and services they require for daily life by their own labour and resources, but this is massively inefficient in use of time and resources - let's consider the example of an individual who makes shoes for themselves only. A lot of effort is required to develop the skills and resources necessary to make the single pair of shoes, and the tools developed require storage space even when they are not in use. The same individual still needs to obtain other clothing, food and shelther, and it is unlikely that they will have the time and resources to specialise in all of these things. Clearly, this is not a particularly efficient way to do things.

The solution to this problem is for individuals to specialise in manufacturing or providing a larger quantity of specific goods or services, and to trade those goods or services for goods and services from others that the individual cannot provide themselves. This has the advantage of ensuring that each individual makes maximum use of their own skills and resources, but also has access to the products of the skills and resources of others also working efficiently in other areas of specialisation.

Mutual beneficiality
When people obtain goods or services under conditions free of duress or obligation, each party to the transaction (assuming the transaction is closed out with no ongoing obligations or debts) receives something that is of greater perceived value to them than what they have given up. This is the condition of mutual beneficiality. If this is not the case, then at least one party will not be willing to engage in the transaction, and no exchange will occur.

The previous paragraph implies that each party must place a different value on on at least one component of the exchange, otherwise mutual beneficiality cannot exist. This condition of differing relative values is a necessary condition for a trade to occur, assuming the absence of compulsion or threat! (Command economies, such as those in China after the Chinese Civil War and the ensuing Great Leap Forward of 1958-61 and Cultural Revolution of 1966-76, in Cambodia after Year Zero (1975 when Pol Pot seized power) or in the Soviet Union after the 1917 October Revolution, involved degrees of compulsion and threat!)

A requirement of any form of trade is that it must be percieved to be fair - the absence of fairness can lead to a refusal to trade or to conflict. Fairness is deeply wired into us, and even animals recognize and object to percieved unfairness in their treatment by others.

Ways of achieving provision of goods and services
For an society to use skills and resources efficiently and fairly, then, some form of free exchange of goods and services is essential. There are several arrangements that allow goods and services to be provided and obtained - or, alternatively, that facilitate economic activity. These include the household economy, the gift economy, the command economy, the barter economy, and the monetary token economy.

The household economy
A household economy is one where goods are not traded, but are instead produced and consumed by the same household. Given the small size of most households, this allows a limited degree of specialisation and efficiency, while also allowing individuals to benefit from the contributions of others in that household. If the expression "tribe" is substituted for "household", the same term might also describe hunter-gatherer societies, such as the Australian Aborigines, which were based on largely self sufficient small tribes (although trade of goods did occur between tribes). The desire for fairness will lead participants to attempt to ensure that everyone makes a "fair" contribution, to avoid other individuals free riding on the efforts of others.

The gift economy
A gift economy is well suited to small groups of people - such as small tribes of hunter gatherers, or, in contemporary Australian society, a group of schoolfriends who have regular contact with each other. In a gift economy, goods or services are freely gifted to other members of that small group. Although there may be no immediate reciprocation, since people in small groups remember who did or gave what to who, it is likely that a reciprocating gift of goods or services will occur later, and also that social standing within the group may be a partial function of one's gifting history. As a result, for each of the individuals within that small group, it is likely that they will value what they receive back from the group more than what they contribute, and they may receive goods or services that they cannot provide themselves - both beneficial characteristics of trade.

Gift economies are heavily dependent on trust, not suited to large groups of people, or to highly specialised or expensive goods and services. They are also unlikely to provide all the goods and services required by an individual - as the number of participants increases towards that required to provide all necessary services, it becomes impossible for individuals to keep track of the gifting history of everyone else in the group, so the shared knowledge and trust required to make the gift economy function are likely to break down. Consequently, a gift economy will only form a part of an economic system, at most.

The command economy
A command economy is one where people are ordered to provide goods and services to others or a greater social grouping, but generally without the condition of mutual beneficiality. This generally requires a degree of duress or compulsion. The major problem is that when mutual beneficiality does not exist, providers of goods and services attempt to reduce their losses by providing the cheapest and most limited goods and services possible, subject to the duress or compulsion applied. This then leads to reduced quality and quantity of goods and services - which was very clear in the Soviet Union under Communism.

The barter economy
A barter economy is based on the direct exchange of goods and services between willing parties, who each value what they receive more than what they give up in order to facilitate the trade. Barter economies are more suited to larger groups of people as they are not so dependent on awareness of a counter party's trading history to ensure fairness. While no society has ever relied fully on a barter economy, all economies have probably always contained a degree of barter. Bartering is well suited to exchanges of simple goods and services, such as foodstuffs, where exchanges can be easily set up with a small pool of available goods and services.

A limitation of barter economies is that one party may desire a good or service from the other, but may not be able to provide goods or services desired by the other party, particularly if the goods are highly specialised. For instance, if I am a banana grower, and wish to exchange some of my bananas for a memory expansion card for my notebook computer, this could be slightly difficult! A possible solution to this problem is to have "rings" of service or good provision involving 3 or more parties, rather than simple bilateral exchanges, but these are awkward to arrange so that either all transactions occur simultaneously in the same place, or so that all transactions are honoured when the individual service or good provisions occur at different times and places.

A second limitation is that some goods or services may not be naturally divisible on useful scales, so it may be difficult to ensure that the goods and services provided by each party to a transaction results in both parties percieving themselves to be better off. If I have a goat, which is worth roughly half a cow (remembering that perceptions of value by different parties will vary), and you have a cow, and we both want living livestock, the natural indivisibility of your cow means that unless I can find a second goat, I can't trade my whole goat for half of your cow.

A third limitation of barter is the lack of any general unit of value to aid relative valuations - if I want to obtain a writing pad and pen, and can supply sock darning services in exchange, it isn't particularly useful to know that a writing pad is worth roughly one three thousandth of a cow. A fourth limitation is the lack of ability to separate the exchange in time - for instance, for one party to supply two goats at the goat market today and then receive a cow at the cow market next month.

The limitations of household, gift, command and barter economies
As the range of goods and services available expands, as in any modern economy, a barter economy, or any economy based on a combination of the above forms of exchange, begins to run into significant problems. This because the number of suppliers of a good or service relative to the population size decreases as the degree of specialisation increases. As a result, it becomes increasingly difficult to achieve mutual beneficiality when a specific individual desires to obtain a specific good or service from another specific individual.

Montetary tokens
Most societies will include a combination of these forms of economy, where the circumstances exist to facilitate them. For the exchanges that cannot be facilitated using these techniques, another method of achieving mutual beneficiality is required. The universal solution to this problem has been to create a monetary token of exchange, or store of value, which is exchangeable for goods and services, and thus breaks the requirement for a bilateral exchange of goods or services as a prerequisite for economic activity.

These monetary tokens must have four characteristics:
1. They must be usable as a medium of exchange - that is, they must be exchangeable for goods and services, and they must be in universal acceptance. In other words, they must be "fiat" currency - currency which is declared to have value.
2. They must serve as a stable store of value - that is, they should maintain their value over a long period of time so that they can be used to conduct a series of transactions occurring over a period of weeks or months.
3. Any good or service must be able to be valued in terms of the monetary token, which also requires that the monetary tokens be divisible into sufficiently small quantities to cover most economic exchanges.
4. They must be of intrinsically low value, so that they are not destroyed and used for other purposes, such as coins with a metal value greater than the face value of the coin.

Fiat money in Australia
Fiat money is money which is generally accepted as having value. The way in which fiat money is conferred with value does not appear to be well understood. Let's start with an inspection of an Australian banknote. All Australian notes bear the inscription, "This Australian note is legal tender throughout Australia and its territories". Thus, the first characteristic, fiat currency status, is achieved by the agency of legal tender. The term "legal tender" does not mean that the note *can* be used as tender (indeed, in Australia, it appears that any object could concievably be used by two consenting parties as tender), but rather that, under Australian law, if one person owes another a debt of, say, ten dollars, and offers a ten dollar note in payment of that debt, the other person cannot refuse to accept that note in payment of that debt. This would apply if a debt is incurred by one party for goods or services already received, and then payment of that debt is offered later, such as if one eats dinner in a restaurant and then pays the bill afterwards.

There appear to be caveats, such that if payment in some other form was specified at the time the debt was incurred, then payment with legal tender may not be enforceable, and that payment in legal tender can be refused if the good or service is provided at the time of the prospective payment, so that no good or service has yet been provided, so that no debt yet exists (as when purchasing food at a takeaway shop). There are also rules relating to what amounts owing various denominations of currency are legal tender for - for instance, coins up to and including 50c are only legal tender for amounts owing up to $5, if they are the only coins offered. The Reserve Bank has a page on the subject, which clearly states the legal situation.

The universal acceptance of notes and coins is also assisted by the recycling of defaced and damaged notes and coins back to the Reserve Bank by the commercial banks, which are replaced by the Reserve Bank and returned to the commercial banks for general circulation. This helps to prevent the situation where merchants or customers refuse to accept damaged notes or coins in payment or change.

The second desirable characteristic of monetary tokens is achieved by government action to ensure that the supply of tokens amongst economic participants is kept stable, and also by enforcement action to prevent illegal counterfeiting of the physical tokens of exchange.

The third characteristic is achieved by setting the value of the currency so that the smallest denomination of currency is acceptably small to be used for minor transactions without complaint.

The fourth is achieved by manufacturing coins from cheap but durable metals, and notes from plastic. A plastic banknote has virtually no intrinsic value - it is too smooth to write on or blow one's nose with, too small to burn for heat, or to use to make anything of intrinsic economic value. The situation with coins is a little more complicated - metal is a useful resource, and coins could be melted down and used to make goods. By ensuring that the face value of the coin remains above the intrinsic value of the metal, this result can be prevented.

Australia's monetary tokens
In Australia, as in virtually all modern world economies, we have three main types of token, which are generally used and accepted as money. They are as follows:
1. Fiat money - consisting of physical notes and coins. These are a purely physical form of money.
2. Bank credit - either money that the bank owes us (deposit accounts) or money that we owe the bank (credit cards and mortgages). These credits only exist on the books of the commercial banks.
3. Reserves at the Reserve Bank of Australia, as accounts belonging to external parties. These reserves only exist on the books of the RBA.

Under benign economic conditions, account holders, commercial banks and the RBA are each able to perform specific exchanges of these monetary tokens. These tokens are defined by the exchanges that can be performed with them, and by the rules that apply to them. These exchanges will be the subject of the next post!

Thursday 7 April 2011

Japan observations, and some ideas on complex systems

I've been quiet for a little bit - a bit of pressure at work, a weekend away and some wheelbuilding for a cycling holiday I have coming up later this month have been taking up a lot of my spare time! I also spent some time working on a post on Victorian water supply security and the desalination plant, but I wasn't particularly happy with the result, so I set that aside, and had a crack at some contemporary affairs commentary.

The effects of the Japanese earthquake on efficient systems

Over the past few weeks, the major item of news has been the Great Eastern Japan earthquake on 11 March, and the event's aftermath. The most serious effects appear to have been the tragic loss of life resulting from the earthquake and tsunami, the severe damage or total destruction of infrastructure, the disruption of the lives of the survivors, and the ongoing struggle to bring the severely damaged Fukushima Daiichi nuclear power plant under control. These four subjects have been attracting most of the media attention.

There have also been a small number of articles discussing supply chain problems resulting from reductions in Japanese manufacturing capacity, and these real world examples have illustrated some of my ideas from my earlier posts in a startling fashion. It is one thing to start with pre-existing data and develop a theory to explain that data, which is always open to charges of theory tweaking to fit the data - but quite another thing when new, independent data emerges that backs up the theory.

To recap, my theory was that efficient systems are brittle, and prone to failure when the environment changes. In the case of Japanese manufacturing, significant damage occurred to manufacturing capacity - which is hardly surprising, given the scale of the devastation. What was not so obvious was the rapidity with which this capacity destruction spread to affect manufacturers in Europe, North America and elsewhere.

The first article is about the supply chain problems resulting from the Japanese events, is this one from the Age.

The opening quote speaks for itself: "The disaster in Japan has exposed a problem with how multinational companies do business: The system they use to keep supplies rolling in is lean and cost-effective - yet vulnerable to sudden shocks." The article continues on in a similar vein, but this first line sums it up perfectly.

A large and diverse number of companies have been affected. According to this article from the New York Times, the production of General Motors' Chevy Volt in the US may be affected, Nissan's engine production plant in the affected area is out of action, Texas Instruments (a US based company) might not resume full capacity production at their Japanese plant until September, Toshiba has closed some NAND flash production lines, SanDisk had concerns about transportation and power supply reliability, while Sony, Canon, Pioneer (home entertainment) and Kirin (beer) may also be affected as they ship their products from ports which have been severely damaged or destroyed by the tsunami.

This article from Der Spiegel discusses the effect of the Japanese devastation on two competing fan manufacturers in Germany, EBM Papst and Ziehl-Abegg. Although competitors, they both obtain essential chips from Toshiba in Japan, and the Toshiba factory that manufactures their chips has been damaged. Consequently they both expected that they would need to shut down their production lines, perhaps for one to two weeks, if delivery of the chips was delayed. The article does not speculate on the consequences if they needed to find an alternative supplier and obtain new stock from them prior to restarting production. In addition, the Japanese manufacturer of the transmission for Porsche's Cayenne SUV is experiencing disruptions to production, a chip manufactured by Toshiba is used in Apple's iPad, and the German carmaker Opel has announced the cancellation of some manufacturing plant production shifts due to a shortage of components from Japan.

The most illuminating paragraph in the whole article is probably this: "The assembly lines at EBM Papst and Ziehl-Abegg now depend on a handful of electronic components from Japan, often costing little more than a few cents. But the transformers, resistors and memory chips are vital components in products ranging from fans for laptops and car engines to the air-conditioning systems in New York skyscrapers and hotels in Mecca."

What this demonstrates is that the manufacture of these highly specialised parts is incredibly efficient and at a low per-component cost, due to the economies of scale resulting from high volume production of a single component, which is then sold to huge numbers of customers worldwide - yet the specialisation and complexity of that part, combined with the lack of other manufacturers making an equivalent part, means that any interruption to supply rapidly propagates around the globe, with no alternative suppliers immediately available. There has been recent media coverage of limited stock of iPad 2's after their launch - I suspect this was due to the damage to Toshiba's chip manufacturing plant.

This illustrates that in modern manufacturing, manufacturers generally buy components from a single supplier in high volumes, which may also be shipped long distances due to the low cost of air freight. Further, manufacturers keep little stock on hand as a buffer against supply disruptions, in order to maximise financial efficiency by reducing warehousing costs as much as possible.

A computer network example

Another example of an efficient, brittle system is the network discussed in this 2008 article from the Oil Drum. The entire article is well worth reading. I read it at the time, and then forgot about it for a few years - but during a search for some unrelated material a few weeks ago, I came across it again. On rereading, I was startled just how well it illustrated my ideas on brittleness - with a disturbing twist. The author (aeldric), in discussing a failure of a computer network due to a faulty software driver on a single machine, focuses on the concept of the "frequency" of a system, and couches his (her?) discussion in slightly different terms - but the ideas expressed are directly analogous to mine on stress transmission through systems, and overall brittleness of systems.

The disturbing twist in aeldric's case study is that computer networks, and the Internet, were originally designed to be robust - so that these networks could continue to function, even in the event of failure of any given component. What the case study shows is that financial imperatives can take over in network management, and the network made more efficient in order to reduce financial cost. This decrease in cost comes at the expense of losses of system redundancy in specific components, which can then quickly cause overall system failure when those specific components fail. The message to be drawn from this is to avoid assuming that our internet-based services are robust - they may not be, and if they fail, they can fail almost instantly.

Complex Systems

Since my first few posts, I've been thinking more about the issue of complex systems, triggered by a few problems in the banking system earlier this year. The defining characteristic of these problems was complexity - and my systems theory (so far) says little about complexity! So, some extension is required. There is a strong correlation between efficiency and complexity - sometimes efficient systems will be complex. So, what are the implications of complex systems? Following are some of the ideas I've come up with.

I have a mental picture of what "complex" means, but I need to define it if I'm going to discuss it meaningfully! So, I will define complex systems as tending to be large, rigid, hard to understand, prone to incorrect implementation, efficient, and brittle.

A "large" system may have significant geographical scope, large financial cost, involve a large quantity of components or infrastructure, employ a large number of people, or interact with a large number of other parties or things. The trend of complex systems to become large might be a direct consequence of their efficiency - if they are competing against other, less efficient, systems, then they have advantages which are likely to result in users of the less efficient system transferring to the more efficient system. This then becomes a mechanism for system growth.

By "rigid", I mean that the system is designed to work in a specific way - for instance, it may only take a particular type of input, it may only provide a fixed set of features, or it may be dependent on the continued validity of a design assumption. If the input form changes, a new feature is desired, or the design assumptions are rendered invalid, then the system needs to be modified so that it will continue to function.

"Hard to understand" is a direct consequence of complexity and self explanatory, while by "prone to incorrect implementation", I mean that it is easy to make an error in system design or construction so that under some scenarios it will not generate the correct response (In software design, these are called "bugs"!) Efficiency and brittleness have already been discussed in earlier posts, so I will not rehash them here.

These properties of complex systems lead to several consequences.

One is that large systems cannot be easily replaced. Another is that large systems are often expensive and time consuming to create - so they are not easy to replace if they fail.

The rigidity and expense of complex systems combined with a desire for new features will often trigger the need or desire to modify the system to incorporate the new feature, as opposed to creating a replacement from scratch. The person or persons attempting to modify the system then need to develop a full understanding of the components of the system that they are intending to modify, so as to add in the new feature without breaking existing features. For on-line systems such as modern banking, an additional requirement is the need to maintain correct system operation while the changes are being introduced.

Because complex systems have the property of being hard to understand, an insufficiently carefully planned modification can break the system. For my day job, I write software for hearing aids - I often need to modify code in order to introduce new sound processing algorithms to a device build. Hearing aid software is highly complex, since many different algorithms need to be run on the audio input samples, while simultaneously maintaining low delay sound processing, with no breaks in the audio output. One of the guiding principles I follow when modifying code is that I need to understand exactly what a piece of software does before I modify it - otherwise I might break some undetected functionality of the code.

If changes to a complex on-line system are not done correctly, then not only can the system fail, but an additional problem - of needing to somehow restore the system to a valid state - emerges. A major characteristic of the NAB batch processing file failure (discussed in my first post) was that as a result of the corrupted bach processing file, the bank ended up in a state where their database was processing new transactions correctly, but existing bank balances were wrong - that is, the system state was incorrect. This appears to have been the major cause of the ongoing problems - the need to restore the customer bank balances in their database to a correct state, by means of manual checking and processing, although the system was by then processing new transactions correctly.

The following two articles are about the ASX failure on 1 March this year. One of them attributes the failure to a problem with a Nasdaq OMX system, which was introduced in November 2010.

ASX trading resumes after tech woes "Trading on the Australian Securities Exchange resumed at the normal time this morning, but the problem behind the disruption to trade on Monday remains unresolved. "

Computer breakdown paralyses trading on ASX
"ABOUT $1.5 billion in turnover was reportedly wiped from the Australian Securities Exchange yesterday after a computer problem forced the sharemarket to close abruptly at 2.48pm. A problem with the new trading system left the exchange with 149,513 fewer trades than the 2010 daily average. It is reported to have about $1 billion worth of trades an hour."

It appears that the problem may have been due to a bug in the implementation of the Nasdaq OMX, although if it was caused by a hardware failure, then perhaps the problem could be attributed to the brittleness property, which is a consequence of an efficient system.

Going by my stated attributes of complex systems, Nasdaq OMX is clearly a large system, it is hard to understand since the cause of the failure was not quickly determined, it may have been prone to invalid implementation (if the failure was due to a bug rather than a hardware failure), it was likely to be highly efficient as it was replacing an existing system, and it was brittle as it failed quickly. This brittleness is another possible indication of efficiency (efficiency implies brittleness, but brittleness may not always imply efficiency).

The following CBA failure, allowing people to overdraw cash from their accounts at ATMs, was startling. It is highly disturbing that CBA chose to allow their ATMs to go into stand-in mode, rather than shutting down the network until the problem could be rectified. According to the articles, CBA understood the likely consequence of this action - that ATM users would be able to withdraw more cash than was available in their accounts. In choosing to go to stand-in mode, the bank then turned what should have been an in-house problem into one that triggered police involvement, which was a significant waste of public resources. I suggest that this was also a means for the bank to shift the costs of their internal problem onto external parties - something that any taxpayer should be strenuously objecting to!

CBA's Netbank hit by tech gremlins
"Update: Police have issued a warning after reports that more than 40 Commonwealth Bank ATMs have been dispensing large amounts of cash. Police are unsure at this stage what has caused the fault and are liaising with the Commonwealth Bank, which has been hit all day by a technical glitch that has disrupted its online banking, ATMs and EFTPOS services."

Faulty ATMs spitting cash after technical glitch
"The Commonwealth Bank took a calculated risk and placed its ATMs into "stand-in" mode yesterday knowing that it would mean customers could overdraw their accounts. The bank confirmed it encountered an issue "when conducting routine database maintenance" but rather than shutting down its network of ATMs while the problem was being fixed, it placed them into stand-in mode to allow people to continue to have access to funds."

According to this Age article, a security consultant who had previously worked for CBA stated that the problems related to CBA's "core banking modernisation" project. The article helpfully provides a link to the CBA media release, titled "Commonwealth Bank Core Banking Modernisation". According to the media release, the purpose of the project is to replace internal legacy banking systems with a new, more efficient banking system. This is a very high risk project - not only is the CBA attempting to replace an entire system with a new system, but they are attempting to do so while maintaining system functionality! It would be interesting to interview some of the technical staff working on this porting project.

Blog observations

I've chatted to a few people who have read my early posts. My fear was that I was trying to write about subjects too technical for for a well educated (but not technically trained) audience, and that I wasn't giving enough examples - but that doesn't appear to be the case - thanks EJ! The blog viewing stats have been surprising - there has been an ongoing level of views, and a few new followers, despite few recent posts. I'm drawing the conclusion from this that, as long as I'm not just rehashing recent news (which can date quickly) then good content remains relevant and people are still interested, even if it's a week or two old. So I think my decision to can the water/desalination article (despite quite a few hours of work) was the right one, and I'll keep the emphasis on turning out interesting ideas which are well written up, rather than going for volume.

I've found blogging quite challenging - sometimes the ideas just flow and something comes together, sometimes it takes a few goes and a few fresh starts before a set of ideas are represented clearly and in a way which makes logical sense.

The next post

For my next post, I'm intending to investigate a gedankenexperiment - a thought experiment. The proposed topic of the gedankenexperiment is this: If you borrow money from the bank to buy a house, this parcel of money is then paid to the seller of the house. The seller may then turn around and use the same parcel of money in a similar fashion - to buy a different house elsewhere, and thus pass the parcel of money on to the seller of this house. The recursivity of the situation is apparent - but it poses the question, what is the ultimate fate of that parcel of money? Does it travel down an endless chain of house transactions, or does it dissipate out in some other way? Have a think about it while I compose the next post.

Thanks to everyone for reading - I'm really enjoying this!

Cheers

Andrew

Tuesday 1 March 2011

More ideas and a case study

Thanks to everyone who has commented on my first post! Taking a quick look over the stats, I'm impressed that at least one individual managed to read the entire thing on an iphone display, and there was also a page view from the UK - where'd that came from? So far, circulation has mostly been restricted to immediate acquaintances, so most commentary has been by direct email. I think people have been receptive to the ideas described, but I suspect it was also considered a rather long (and perhaps overly academic?) post. So I'll try and keep future posts a bit shorter, and also bring in current events and examples to make them a little less abstract.

Summary of last post

For those who are coming in a bit later, the first post was an in depth discussion of several ideas and perspectives on systems behavior. The central ideas were that systems can be simulated using system models, and the appropriateness of the model depends on the system behavior under investigation. I then introduced the concept of tough and brittle systems - tough systems are capable of absorbing stresses by internal adaption while still providing their "system service" - while brittle systems subject to increasing stress will reach the failure point much more quickly. Both types of system will fail when subjected to high enough stresses. A subtle point is that whether that failure occurs depends on the proportional increase in stress. If you have a brittle system which is only subjected to small increases in stress, then it may turn out to be a durable (long lived) system, while a tough system subject to several orders of magnitude increase in stress will probably fail.

I also introduced the ideas of systems redundancy in components - if a system has no redundancy in a component (the system will fail if the component fails) then the threats to the integrity of that component should be subject to greater scrutiny. Feedback loops (positive and negative), as well as the importance of considering the consequences of failure of a system at design time, were also discussed.

New idea 1 - Inertial vulnerability

I've been thinking about these ideas a little further since, and about the idea that systems are not static - they can be deliberately changed in response to perceived threats or changes in the environment. Some systems can be changed much more rapidly than others - a computer network, or an airline route network, can be reconfigured quickly. Other systems cannot be changed on timescales of less than years, or even decades - for instance, the big mining and power generation firms have infrastructure costing billions of dollars to construct, and which has to be depreciated over time periods as long as several decades before they can be retired. The Australian Navy's submarine fleet is a similar example - the Oberon class was in use from 1967 to 2000, and the Collins class submarines, commissioned between 1996 and 2003, are intended for use until the 2020s. The relacements for the Collins submarines, intended to be in use from 2025 until the 2070s, are already being planned. Just three classes of submarine will cover over a century of operational use.

Inertial vulnerability is when a system, or some component of a system, is restricted to very slow, or significantly delayed, rates of change, meaning that they can easily be rendered inappropriate or obsolete by a changed environment. Any system which is totally dependent on a specific future, or narrow range of futures, coming to pass, and which involves obligations, infrastructural commitments or significant loan repayment periods that requires years or decades to resolve, is inertially vulnerable.

New Idea 2 - Multiple valid representations

An additional idea is that systems can have multiple, equally valid, representations - depending on behavior of interest and the defined system objective. Let's take CityLink as an example - as a privately owned firm floated on the sharemarket, the predominant purpose of the company for the owners is as a vehicle to earn income. The provision of a toll road to drivers is just the means to that end. However, for drivers, the purpose of the company is to provide a time saving toll road - and the money paid for the service is the means to receiving that system service. The system objectives of the owners and of the travelling public can thus be seen as co-existing mirror images of each other - neither can achieve their system objective without the involvement of the other.

Alternatively, CityLink might be viewed (by someone interested in energy consumption) as providing a service that requires drivers to consume a given quantity of fossil fuels in order to realise a given quantity of system services. The energy analyst might ask questions about how much more efficient cars can become in their energy consumption, as a way of evaluating CityLink's vulnerability to a fossil fuel scarcity. A union organizer, on the other hand, might be more interested in how many staff they employ, and financial flows through the organization - they might look at all income to the company, and expenditures - and then investigate the distribution of the wage bill amongst different levels of management. They might use this representation to point out that more money can be paid to front line staff by reducing the size of management, or the size of management salaries.

These representations of the same system are constructed for different purposes and are very different as a result - but are equally valid. So, systems can have multiple and equally valid system models - the appropriate choice of system model will depend on one's personal position and the subject or system behavior of interest. The basis for a system model will generally start from some limited resource used by the system - energy, money and the workforce are some of the many possibilities for a toll road company.

What systems do we have that demonstrate efficiency?

And now it's time for this blog to start getting real! It's all very well to set forth a beautiful theory, but pointless if I can't relate it to the real world, and use it to understand real systems better. The big idea, so far, has been that of efficient systems being brittle. So, it's time to go system hunting! Can we find examples of systems that demonstrate some of these ideas?

Airlines

The first one that comes to mind is the airlines. Airlines are exposed to the following risks:

* New technology without a proven history of reliability
* Large numbers of assets costing in the order of one hundred million dollars each
* Organizational complexity
* A large and highly skilled workforce
* Complex logistical operations
* Significant exposure to energy prices
* Long lead times on fleet planning
* Complex asset maintenance requirements
* Different legislative requirements for each country of operation
* Intense competition with other airlines for passengers

Taking Qantas as an example, the airline has a fleet of 135 aircraft as of February 2011, including 9 Airbus A380s and 38 Boeing 737-800s (most numerous aircraft type). The Airbus A380 list price is US$375.3 million, with the Boeing 737-800 list price being US$80.8 million. The replacement cost of these 47 aircraft - approximately one third of the Qantas fleet - would be approximately six and a half billion dollars. According to the Qantas Data Book 2010, the total assets of the Qantas Group in 2010 were stated as AU$19.9 billion, against annual revenue of AU$13.8 billion. Staff and fuel bills (AU$3.4 and AU$3.3 billion respectively) each made up approximately one quarter of the operating costs of the airline - but profit (after tax) for the year was only AU$116 million, less than 1% of revenue, and less than 1/3 the cost of a single A380! A blowout of just 3.5% in either the wage or fuel bill would be enough to wipe out the year's profit. For this reason Qantas engages in complex fuel bill and exchange rate hedging strategies to try and protect their profit margins against fluctuations in the exchange rate and fuel costs - but these strategies can only provide partial protection in a high risk environment. Hedging strategies don't protect Qantas against the drop in air travel that would result from a sustained increase in oil prices, which would erode the financial capacity of the public to spend money on air travel by increasing the costs of many other goods and services.

In order to remain profitable, airlines need to be constantly pursuing efficiency improvements in fuel usage, staff efficiency, aircraft costs, seat occupancy rates and so on, while also keeping prices low enough to compete with other airlines for market share. During the 2009/10 financial year, 82.5% of seats on Qantas Group aircraft were revenue generating, meaning that they were occupied by paying passengers. Revenue generating seat percentages are kept as high as possible through the practice of overbookings and constant adjustment of flight schedules - I suspect it has become increasingly common for flights to be cancelled, when it is possible to accommodate all affected passengers with spare capacity on other flights. Likewise, Qantas's move to the A380 has been driven by the increased efficiency of the aircraft - it burns approximately 10% less fuel per passenger than the 747. However, this is offset against the significant risk involved in any shift to a new aircraft and new technology, and also against the requirement to fill each aircraft with a much larger number of paying passengers in order to realize the potential efficiency gains.

The much-publicised Rolls Royce Trent 900 engine failure on Qantas flight 32 from Singapore on 4 November 2010 also provides an informative insight into the risks posed by new aircraft and new engines. The incident, which had the potential to cause the loss of the aircraft, led to the grounding of all six of Qantas's A380s for 23 days while the cause of the engine failure was investigated. The event exposed Qantas to significant financial losses, which had the potential to wipe out their profit margin. Qantas subsequently filed a statement of claim against Rolls Royce for financial losses due to the engine failure, which were estimated to be around $60 million in costs and lost revenue.

Given that this blog has an emphasis on systems, and both aircraft and jet engines are complex systems in their own right, the failure of the number 2 engine on Flight QF32 is worth closer examination, as is the contractual relationship between Qantas and Rolls Royce. It used to be that jet engines from engine manufacturers were bought as part of the aircraft, and owned by the airline or the aircraft leasing company - but it is now now a common arrangement for the airline to rent the engines from the engine manufacturer, paying a rental rate based on engine usage. In effect, this means that the relationship between the airline and engine manufacturer has changed from one of engine purchaser and engine retailer to one of propulsion service user and propulsion service provider. The benefit of this approach for the airline is that they can reduce their financial risk as they no longer need to buy engines outright, and can pay based strictly on usage, which helps control costs when there is a drop in air travel. On the other side, the engine manufacturer has a regular income stream, but now carries the risk of a downturn in air travel reducing engine usage and therefore their rental income.

There are two engine manufacturers making engines for the A380 - Rolls Royce, and Engine Alliance (a joint venture between General Electric and Pratt & Whitney). The airlines using the A380 prefer to have at least two manufacturers making engines for the 380, as competition for market share amongst engine manufacturers helps to keep engine prices low. If there were only one manufacturer and they were abusing their dominant market position, it could take in the order of five years for an alternative engine to be designed and built by another manufacturer (the Trent 900 took 8 years to design and build), which would impose significant costs and losses on the airlines and the aircraft manufacturer. However, there is still significant competition between the two engine manufacturers for market share - given the very high fixed costs of an engine development program, a small increase in market share can correspond to a significant increase in profits. Consequently, there is great financial pressure for the manufacturers to produce the lightest, most efficient, most reliable and most powerful engines possible, with the result that there is considerable pressure to push engine design to the absolute limit of safety.

At this point it is worth diverting briefly and explaining the basic function of a jet engine, so as to better describe the current state of the art of modern jet engine design. The basic principle is that air flows into the front end of the engine, where the low pressure compressor blades (the large, prominent blades visible from the front of the engine) are are located. The purpose of the low pressure compressor is to begin the first stage of compressing air flowing into the engine, so that when it reaches the combustion chamber, it is pressurized. Jet fuel is injected into this airstream in the combustion chamber, where it burns. For physics reasons to do with the air velocities involved, the resulting hot mixture of air and burnt fuel flows out the back of the engine, rather than out the front. It is at very high temperature (since a higher engine combustion temperature corresponds to increased engine efficiency) and very high velocity. This high velocity is what generates the engine thrust. As it flows out the back, it also flows over turbine blades which are mounted on shafts that run through the engine and are connected to the compressor blades in the engine intake. The hot gas flowing over the turbine blades makes them turn and drives the compressor - so the engine performs the neat trick of both generating thrust and also the power required to keep the jet generation process working.

The power of a modern jet engine is extraordinary - this YouTube video shows what a jet engine at full power can do to a light truck in the wrong place! The Trent 972B (used on Qantas' A380s) generates over 36 tons of static thrust at full power, which comes from throwing large quantities of air backwards at very high speed. There are two essential aspects to making a jet engine as efficient and powerful as possible - one is to burn the fuel as hot as possible, the other is to lose as little kinetic energy from the airstream as possible as it flows over the turbine blades, so as to maximise engine thrust. Even though the turbine blades are made of proprietary titanium-nickel-aluminum alloys which are super strong, the temperature of the hot gas is greater than the melting temperature of the alloy! Obviously, without some means of managing this problem, the turbine blades aren't going to live very long when the engine is running. The solution is to have a network of fine holes inside the turbine blade itself, which bleed cooling air from elsewhere in the engine over the surface of the turbine blade, creating a thin cushion of air that insulates it from the hot gas. Even then, this still isn't enough to permit the alloy blades to survive for long in this high pressure and high force environment - the blades also need to be grown as a single crystal, to eliminate inter-crystal boundary weaknesses from the metal! The fact that turbine blade technology has been pushed to this extraordinary extent demonstrates the limits to which the engineering and metallurgy has been pushed in order to make efficient, high power jet engines feasible. It also hints at the narrowness of the dividing line between an engine which is operating normally, and one which fails - because a modern jet engine is so efficient and has so many parts functioning near to the absolute limits of their structural capacities, the failure of just about any component will cause the failure of the engine.

During certification, a Trent 900 was subjected to a test in which the engine had an explosives package attached to the root of one of the compressor blades, and was run at full power. The explosives package was then detonated to simulate a bird strike. This is video of the test. The purpose of this test was to provide assurance that a bird strike would not result in components being ejected from the engine casing, and threatening the rest of the aircraft. In this case, the engine passed. However, expensive engine tests like this can only be justified for reasonable scenarios that might be expected to occur in use - it is not possible to test every single possible risk scenario. A bird strike is obviously a highly likely scenario, so this was tested for.

On QF 32, the initial cause of the failure was the failure of an oil supply pipe leading to a high pressure bearing within the turbine. A connection had been drilled slightly off centre during the manufacturing process, so that the wall of the connection was too thin to resist fatigue cracking. When fatigue fracture of the oil pipe occurred, presumably some time after QF32's takeoff from Singapore, oil flowed out of the failed connection into places in the engine where it didn't belong - where it burnt and applied extra heat to components that were subsequently forced beyond their material limits, and failed. The subsequent holes in the engine casing, wing, wing flaps, wing spar and fuselage of the A380 were all created by bits of disintegrating turbine being flung out from the engine at extremely high speed. These photos show the extent of the damage to the aircraft.

It appears that the risk of turbine blade failure was intended to be controlled by proper engine design and manufacture so as to prevent such a scenario occurring, since the engine casing on QF32 was clearly unable to retain the bits of disintegrating turbine. The misdrilled pipe connection, and subsequent oil leak, was all it took to push this modern jet engine, representing the absolute pinnacle of mechanical and materials engineering design, outside a safe operating condition - with spectacular results that came near to causing the loss of QF 32.

To illustrate another example of airline risk, the April 2010 eruption of Eyjafjallajökull forced the grounding of many flights through European airspace, due to the threat posed to jet engines by airborne ash. These flight restrictions were estimated to be costing airlines approximately US$400 million per day - unlike the Trent 900 failure on QF 32, affected airlines were unable to recover costs through legal action. Likewise, the grounding of all commercial aircraft in the United States for several days after 11 September 2001, and subsequent longer term changes in travel patterns, imposed significant financial losses on many domestic US airlines. These exacerbated existing financial difficulties, and pushed many of them closer to bankruptcy.

In order to reduce costs as far as possible, airlines make assumptions about their future operating environment, and invest in aircraft on the basis of those assumptions. Getting these assumptions wrong can and does lead to the failure of an airline. A contributing factor in the failure of Ansett in March 2002, apart from a large wage bill, was that they were flying too many different types of aircraft, which imposed higher maintenance costs than other airlines with less diverse fleets.

The tight financial margins of the airline industry are beautifully illustrated by a (possibly acrophycal) quip on how to become a millionaire, attributed to either Richard Branson and Warren Buffet - "Become a billionaire, then buy an airline".

This discussion shows that the airlines are exposed to an extraordinary array of business, travel pattern, financial, energy cost, asset maintenance and technological risks, and that modern jet engines operate near to the absolute limits of what is possible, due to the constant quest for greater efficiency. It doesn't take much at all to push an airline or an aircraft into failure - they are truly "brittle" systems, in the sense discussed in the introductory post.

The next major post is intended to be a description of the workings of the Australian monetary system, which will take a while to research and write - but I have a few shorter posts planned for the intejavascript:void(0)rim, to keep things ticking over!

Cheers

Andrew

Tuesday 22 February 2011

Introduction

G'day, my name's Andrew. I'm a 34 year old digital signal processing (DSP) engineer, distance runner, environmentalist, skiier and bike nut. My job involves writing software for sound processing on DSP chips - mobile phones, bluetooth headsets and hearing aids all use audio DSP processing. I earned an engineering/science undergraduate degree from Melbourne Uni in 1999, with physics and mathematics for the science component. This was followed by a two and a half year stint of structural and acoustic engineering in an engineering consultancy, followed by three years of grant administration with State Government, before joining another engineering consultancy for another year at the coal face, this time doing mechanical services and acoustics work. At that point I decided that something a bit more technical might be my thing. So I enrolled in a Masters of electronic engineering at RMIT, which saw me into my present job, where I've been for three years and which suits me very nicely!

I've read quite a few of the Australian economic bloggers (Unconventional Economist, Delusional Economics, Observations of an Economist Environmentalist, Houses and Holes, Tasmanian Real Estate Trouble, Critical Influence, Billy Blog), in addition to a few overseas ones (Oil Drum, Automatic Earth and Energy Bulletin), and many books on systems, finance and energy. I've also done a fair stack of reading recently (on and off line) in an effort to understand the actual workings of the Australian monetary system.

In November 2010, a coalition of some of the abovementioned blogs (UE, DE, H&H plus the Australia Institute) launched an essay competition, named Son of Wallis, to invite people to submit essays on debt, securitisation, competition and stability in the Australian banking sector. The competition name was a reference to an earlier Australian financial system inquiry (the Wallis Report), which was presented to the Treasurer in March 1997. The reuse of this name was to make the point that further reform of the Australian banking system is needed, and that new ideas are also needed. I wrote a submission to this competition.

At about the same time, I also made a separate submission to the Senate Inquiry into Competition within the Australian banking sector. 111 submissions were received, and many of them make interesting reading. My own is submission #65.

Some time last year (2010) I had a few ideas buzzing around in the back of my head, and these led me to reserve this blog page. I had a few other things competing for my time, so, although the ideas multiplied and were jotted down in various places, I didn't quite get to the point of sitting down and pulling it all together and getting it started. However, the act of writing these two submissions, combined with a week of running, reading and writing in one of my favourite places - Falls Creek - over the end of year break led me to crank things up and start writing!

There are quite a few economists and finance people already writing excellent blogs - as an engineer with mathematical training and an interest in systems theory and behaviour, I'm hoping that Systems & Limits will be a logical addition to the existing blogs, rather than present as a competitor.

System - a definition

The word "system" has several meanings. The one I have in mind is this: "complex whole, set of connected things or parts, organized body of material or immaterial things" (Concise Oxford). Our civilisation is made up of many systems - physical structures, health care, finance, education, social welfare, transportation, technology, political, justice, agriculture, energy production, energy generation and administrative systems are just a few examples.

Systems contain components, which respond to their inputs by modifying their internal states and generating outputs, according to specific sets of rules. System components can be anything - from steel beams in skyscrapers, to mobile phones in mobile phone networks, to individual account holders in a banking system, to shops in a food distribution system. Systems also have connections between components, by means of which the state of one component can affect the state of another component. These connections can include loads and forces in civil structures, signals in mobile phone networks, money and credit flows in a banking system, or food supply flows in a food distribution system.

I believe that although the type and scale of systems can vary widely, systems in general share many characteristics and attributes, and that as a result, principles and concepts can be developed to aid the study of any given system.

All systems exist within our natural environment and are a subset of our environment. They cannot continue to exist without this sustaining supersystem. The idea that the environmental is a subset of the economic system is prevalent (and often implicit without being explicitly stated) but wrong. As the enormous graffitied slogan on the chimney of the Spencer St power station in the Melbourne CBD used to say - "No Jobs on a Dead Planet"! Unfortunately, the power station, chimney and prescient message were demolished in late 2007.

The common feature of all human-created systems is that they are created for a specific purpose - what I call a system service. In contrast, natural systems are not purpose driven - they are just the way they are! The purpose-driven nature of human-created systems allows us to define system failure as the failure of a system to achieve the purpose for which it was created.

In studying systems, it is useful to create a conceptual model of a system in order to investigate and explain behaviour. The objective in creating a system model is to develop a model which is a simplification of the real system, but still a reasonable representation of system behaviour. System models can be verbal descriptions, scale models, mathematical models or PC simulations, amongst other forms.

Systems can be represented recursively - as systems of systems of systems, much like Russian dolls. However, this can lead to unnecessarily complex system models. When modelling an arbitrary system, the level of detail in the system model should be selected appropriately for the system behaviour under investigation. For instance, when looking at data flows in a PC network and network capacities, PCs are sensible element choices for the system model, while the network links can be represented by connections between PC elements. As this model includes individual data sources and destinations, as well as network structure, it has sufficient resolution to determine data volumes in the network links. When looking at individual PC behaviour when infected by a virus transmitted over the network, it might be more appropriate to represent the infected PC as a sub-system, where the complex design of the operating system is represented in a fashion that allows the interactions betweeen the virus and the operating system to be represented.

Whenever investigating some aspect of the behaviour of a system, the representative model for a system must have component scales selected for relevance to the system behaviour under consideration. I propose to call this the system representation scale (SRS) of the system for the system behaviour under investigation. The SRS for an arbitrary problem may fall anywhere between multi-billion star galaxies in galactic super clusters when considering the Hubble Constant; down to quarks, leptons and bosons (subatomic particles making up atoms) when considering the output of high energy collisions at the Large Hadron Collider in Switzerland. Virtually all system behaviours for any systems of possible human interest will have SRSs that fall somewhere between these two extremes.

Engineered structures - the Upper and Lower Bound theorems

My introduction to systems study was during my undergraduate structural engineering studies. Our primary concern was identifying the induced forces in loaded structures, and then later determining the strength limits of structures, or designing structures to withstand specified loads. On one level, structural systems are quite simple, and simple structural analyses can be performed. On another level, however, we do not have a perfect understanding of materials, or of structures, and thus our structural analyses will only ever be approximations to the reality of the built structure in use. In addition, many structures are indeterminate for applied loads, which means there is more than one possible load path and thus the structural analysis becomes more difficult, as it is difficult to identify which load paths take which proportion of the applied load.

This does not mean that it is not possible to design structures safely - quite the contrary! The reason for this is that we can make conservative assumptions about the properties of materials and about the loads to be imposed - and combine these with the upper and lower bound theorems to derive a safe structural design. I believe some of the ideas contained in these theorems have wider validity to other non-structural sytems, so I'll describe them in a little more depth.

The upper bound theorem can be stated as follows: "A collapse load computed on the basis of an assumed mechanism will always be greater than or equal to the true collapse load" (p 121, Ductile Design of Steel Structures).

This can be restated in more general terms, as follows:

If you assume a way in which a structure can fail (ie, specify points where the structure will fail, or "break" in some fashion), it is possible to calculate, using the structure geometry and assumed structural element capacities, the amount of imposed force required to make that structure fail *in that failure mode*. For the loaded structure in the figure below, two possible failure modes are shown - A and B. There is an infinite number of geometrically possible failure modes, although only one will actually occur when the structure is loaded.

For any given failure mode, the corresponding failure load is determined by the structure geometry and structural element capacities - this is relatively easy to calculate. The upper bound theorem states that the failure load of each of these failure modes will always be greater than or equal to the actual failure load of the structure for that specific load. The result is that the upper bound theorem allows us to calculate upper bounds on the strength of a structure, even if we don't know the specific failure mode of the structure.

In the figure below, the required force to induce failure will differ between failure mode A and failure mode B. Of these two failure modes, both will be upper bounds to the actual failure load of the structure, but the lower failure load will be a better approximation.

The lower bound theorem can be stated as follows: "A collapse load computed on the basis of an assumed moment diagram in which the moments are nowhere greater than Mp is less than or equal to the true collapse load" (p 121, Ductile Design of Steel Structures).

A common language translation of the Lower Bound Theorem is this: If you assume a load path for an imposed load in a structure, and calculate the maximum load that you can apply *for this load path* without overloading any component of the structure in the load path beyond the strength of that component, then this maximum load will be a lower bound on the load required to induce collapse of the structure - ie, the real structure will definitely be able to withstand this lower-bound load, although we won't know how much higher the actual failure load will be.

For a coarse, back-of-envelope analysis of a given structure subject to an imposed load, upper and lower bounds can easily be derived, and the structure selected to ensure that the lower bound is above the imposed load to be designed for. However, there will be a gap between the upper and lower bounds. For a large, expensive structure, such as a bridge or transmission tower, a simple structural analysis would result in a structure that was much stronger (and more expensive) than it needed to be, in order to ensure that the lower bound was above the design load of the structure. If the accuracy of the structural analysis was increased, the upper and lower bound estimates would come closer together and reduce the degree of uncertainty in the strength of the structure. Combined with an iterative design approach, this would enable the identification of a design that minimised construction material volume and costs, while still being strong enough to withstand a specified design load.

The fact that our understanding of materials is limited, and that our complex structural analyses are based on approximations, means that even increasingly sophisticated upper and lower bound calculations will almost never result in the same limit load - but they will still give us sufficient certainty to design a structure that is reasonably economical to construct, with a reasonable margin of safety built into the design, giving confidence in the structure's acceptability.

Limits of static analysis

An important point is that the upper and lower bound theorems are usually only applied to static structures. Once real structures are loaded past their actual strength limits, they will begin to fail. It may be possible to identify the first point of failure, but then the subsequent modes of failure may become increasingly difficult to identify, as dynamic effects (momentum and elastic energy) come into play, and the load paths in the structure change in an increasingly chaotic fashion as more and more elements fail.

The primary objective of structural engineers is to ensure that a structure never fails!, and this is done by focusing on static structural analysis. This is valid, since a static structure undergoing dynamic motion has already failed, by definition. Investigating actual modes of failure is largely left as an academic question, or as an investigative exercise when something really does fail and the cause of failure needs to be determined, so as to prevent recurrence in similar structures elsewhere. The Royal Commission into the collapse of span 10-11 of Melbourne's Westgate Bridge on 15 October 1970, killing 35 men, is an excellent example of this. The purpose of the Royal Commission was to determine the mode, and ultimate cause, of the fatal collapse, and to review whether aspects of the structural design may have been undesirable. Although not available online, I have a copy of this report, and may review it in a future blog.

Stress absorption and transmission

A crucial concept for understanding some of the ways in which systems can fail is the idea of transmitted stresses between system components. A system component subject to stress can either absorb that stress internally, relay that stress to connected components, or perform some combination of the two. As an example, consider the case of an individual who takes out a mortgage to purchase a house, with her parents putting their house up as a guarantee on the loan. Let's assume that there is an economic downturn, and that the individual loses her job at a technology startup. If she has some extra savings set aside, then she can react to the imposed financial stress of losing her job by dipping into her extra savings to continue to pay the mortgage while she looks for new employment. This would be an internally absorbed stress. If she has no savings set aside, however, and is forced by the bank to sell her house into a weak market for less than the value of the mortgage, then the bank could call in the mortgage guarantee and force her parents to come up with the cash to pay out the mortgage. This would be a transmitted stress, since the effect of losing her job has been for a financial obligation to be placed on her parents, by means of the mortgage and the mortgage guarantee.

Tough systems and brittle systems

Using the concept of stress absorption and stress transmission, I define a "tough" system as being one where imposed stresses are predominantly absorbed by system elements, or changes in component inputs result in changes in component states, but little change in component outputs. Eventually a tough system subject to increasing imposed stresses will fail, but there will be a process of internal stress absorption, followed by stress transmission between components, before system failure occurs. A defining characteristic of a tough system is that there is a significant increase in relative stress between the level at which system adaption begins to occur (stresses begin to be transmitted between elements) and the level at which complete failure of the system occurs. In the above example, the borrower having savings set aside would be an example of a tough system.

In a similar fashion, I define a "brittle" system as being one where imposed stresses result in stresses being transmitted between elements, rather than being absorbed within individual elements. Since stress transmission occurs readily, increased imposed stresses will quickly lead to changes in many components of the system, and hasten the complete failure of the system. A defining characteristic of a brittle system is that there is only a small increase in relative stress between the level at which stresses begin to be transferred between elements, and the level at which complete system failure occurs. In the above example, the borrower having no savings set aside to cope with unexpected financial problems would be an example of a brittle system, since an imposed financial stress would propagate readily.

A brittle system is not necessarily more prone to failure than a tough system, since failure of either system depends on the likelihood of a stress greater than the failure stress occurring, and this is dependent on the system environment. The difference is in the response of the two types of system to increasing stresses. Any increase in stress in a brittle system will lead the system towards failure more quickly, and there will be fewer warnings of impending system failure.

Efficiency

I'll define a system as being efficient if the specified level of system service cannot be provided for a lower cost, whether in terms of money, resources, people or capital. Since greater capacity in a system generally requires more money, resources, people or capital, an efficient system is likely to be made up of components operating close to their individual capacities, and thus an efficient system will be brittle.

Redundancy in systems

Returning to a high level perspective, our civilisational systems are often complex. When operating normally, they provide specific system services - such as transporting cars across a river as over Westgate Bridge, conducting financial transactions between individuals as in a financial system, conveying phone conversations between two mobile handsets as in telecommunications systems, permitting decisions to be made on public infrastructure as in our political system, and so on. Each of these systems is made of many subcomponents. Some components can fail without imperilling the system (redundant components), while other components are essential (essential components) and their failure will mean at least partial failure of the overall system.

Redundancy is a separate property to stress absorption/transmission, brittleness/toughness and efficiency - it is the sensitivity of a system to the failure of a specific component. It does not express the actual likelihood of that failure occurring. When discussing the effects of changes in inputs to a system, toughness and brittleness are much more meaningful characteristics for investigation. However, when considering factors that might cause the failure of a specific component, system redundancy in that component becomes relevant.

When components of our systems fail, then the redundancy of the overall system in those components can vary - from complete redundancy (other components will take over the role of the failed component, and the system will continue to operate with no loss of functionality) through partial redundancy (other components will take over some of the roles of the failed component, but not all - and the overall system will exhibit degraded performance) through to no redundancy (the overall system loses all functionality).

It is not necessarily a problem if a system has low redundancy in a particular component - what matters is the risk that a particular component will fail, combined with the redundancy of the system in that component. If a component has a high risk of failure, but the system is redundant in that component, then the risk posed to the system is relatively low. If the component has a high risk of failure and the system is not redundant in that component, then the risk posed to the system is high.

The concept of redundancy also provides a way to evaluating the risk posed to a system by something that is not explicitly considered in a system model, such as error. If a system has little or no redundancy in a component, it would be wise to investigate whether there is anything that might trigger the failure of that component, and to ensure that strategies are employed to reduce or eliminate that risk. This technique is particularly valuable if the system model used to assess system stability against imposed stresses does not include factors that pose a risk to non-redundant components.

The NAB batch payment failure on the night of 25 November 2010 illustrates these ideas. Although a comprehensive account of the causes of the failure is not available, the cause appears to have been that a corrupted batch payments file was submitted to their central database, and that errors caused by the corrupted file then propagated through the rest of the banking system. This resulted in the freezing of the accounts of firms, the non-payment of wages and the double charging of some account holders for transactions on their accounts. I think it is reasonable to describe this result as a partial failure of the Australian banking system, so the Australian banking system can be described as having no redundancy in NAB's batch payment file processing against partial failure.

NAB should have had a method of ensuring that either batch payment files submitted to their database were not corrupted, or of quarantining database outputs (and maintaining progressive, restorable backups of database status), until they had certainty that earlier inputs to the database were valid and the results of internal processing correct, whereupon onward processing to the Reserve Bank and to other banks could then be released from quarantine and sent. It would be of great interest to know more about the causes and chronology of this partial failure of the banking system.

Feedback loops

Feedback loops occur when there is a cyclic loop within a system. Feedback loops can be negative or positive - negative feedbacks serve to stabilise a system, while positive feedbacks serve to destabilise it. Feedback loops are common in many systems. As an example, in financial systems, asset prices are dependent on investor perceptions of value, investor perceptions of value are influenced by the observed behaviour of other investors, and the behaviour of other individuals are influenced by asset prices. Whether the feedback loop is positive or negative depends on the interactions between asset prices, perceptions and investor behaviour. Hearing aids are subject to "squealing", which is due to acoustic feedback, and the throbbing of poorly adjusted cantilever brakes on bikes is also due to a feedback effect.

Systems can be designed to adapt themselves so that positive feedbacks are controlled - this requires a system where the function of the system changes in response to the system's internal state. This is a complex subject, and a proper treatment would make this essay unacceptably long! However, an understanding of feedback processes is often essential to understanding system behaviour.

Consequences of failure

When considering the possibility of the failure of a system, the consequences of that failure are an important consideration. The degree of effort expended on preventing a system failure should be related to the consequences of the system failure. For a system like Facebook, the consequences of a system failure are unlikely to be particularly significant, beyond missed social engagements and lost contact information. For a system like a vaccine development program, the failure of the vaccine program may lead to a higher chance of death from the disease that the vaccine is intended to prevent, or to significant adverse effects on vaccinated individuals, resulting from unexpected side effects of the vaccine. For a system like an Airbus A380 or a skyscraper, failure may mean the deaths of large numbers of people. The amount of effort devoted to systems design should be proportional to the consequences of the failure of that system.

Conclusion

I will wrap up by making two points. First, a good understanding of the system being discussed is an essential prerequisite to meaningful and intelligent discussion of that system. If one does not understand the workings of a system, then one cannot have confidence in the value of one's commentary on that system.

Second, while it is valuable to be critical of human-created systems - if those criticisms are well founded - it is also important to make suggestions on how systems can be modified, either to reduce the possibility of failure, or to reduce the consequences of failure. In other words, it is good to criticise, but just as important to say how something can be improved.

My next few posts are intended to be financial, but I'll detour as interest dictates! Going by the volume of text file and notebook jottings, I'm not expecting to run out of ideas any time soon. I may not be as prolific a poster as some other financial commentators - but it is hoped that each post will be a considered and thought provoking essay.

I hope that you've all enjoyed reading this first Systems and Limits post - and that it brings you back for more!

Cheers

Andrew

Systems & Limits