Thanks to everyone who has commented on my first post! Taking a quick look over the stats, I'm impressed that at least one individual managed to read the entire thing on an iphone display, and there was also a page view from the UK - where'd that came from? So far, circulation has mostly been restricted to immediate acquaintances, so most commentary has been by direct email. I think people have been receptive to the ideas described, but I suspect it was also considered a rather long (and perhaps overly academic?) post. So I'll try and keep future posts a bit shorter, and also bring in current events and examples to make them a little less abstract.
Summary of last post
For those who are coming in a bit later, the first post was an in depth discussion of several ideas and perspectives on systems behavior. The central ideas were that systems can be simulated using system models, and the appropriateness of the model depends on the system behavior under investigation. I then introduced the concept of tough and brittle systems - tough systems are capable of absorbing stresses by internal adaption while still providing their "system service" - while brittle systems subject to increasing stress will reach the failure point much more quickly. Both types of system will fail when subjected to high enough stresses. A subtle point is that whether that failure occurs depends on the proportional increase in stress. If you have a brittle system which is only subjected to small increases in stress, then it may turn out to be a durable (long lived) system, while a tough system subject to several orders of magnitude increase in stress will probably fail.
I also introduced the ideas of systems redundancy in components - if a system has no redundancy in a component (the system will fail if the component fails) then the threats to the integrity of that component should be subject to greater scrutiny. Feedback loops (positive and negative), as well as the importance of considering the consequences of failure of a system at design time, were also discussed.
New idea 1 - Inertial vulnerability
I've been thinking about these ideas a little further since, and about the idea that systems are not static - they can be deliberately changed in response to perceived threats or changes in the environment. Some systems can be changed much more rapidly than others - a computer network, or an airline route network, can be reconfigured quickly. Other systems cannot be changed on timescales of less than years, or even decades - for instance, the big mining and power generation firms have infrastructure costing billions of dollars to construct, and which has to be depreciated over time periods as long as several decades before they can be retired. The Australian Navy's submarine fleet is a similar example - the Oberon class was in use from 1967 to 2000, and the Collins class submarines, commissioned between 1996 and 2003, are intended for use until the 2020s. The relacements for the Collins submarines, intended to be in use from 2025 until the 2070s, are already being planned. Just three classes of submarine will cover over a century of operational use.
Inertial vulnerability is when a system, or some component of a system, is restricted to very slow, or significantly delayed, rates of change, meaning that they can easily be rendered inappropriate or obsolete by a changed environment. Any system which is totally dependent on a specific future, or narrow range of futures, coming to pass, and which involves obligations, infrastructural commitments or significant loan repayment periods that requires years or decades to resolve, is inertially vulnerable.
New Idea 2 - Multiple valid representations
An additional idea is that systems can have multiple, equally valid, representations - depending on behavior of interest and the defined system objective. Let's take CityLink as an example - as a privately owned firm floated on the sharemarket, the predominant purpose of the company for the owners is as a vehicle to earn income. The provision of a toll road to drivers is just the means to that end. However, for drivers, the purpose of the company is to provide a time saving toll road - and the money paid for the service is the means to receiving that system service. The system objectives of the owners and of the travelling public can thus be seen as co-existing mirror images of each other - neither can achieve their system objective without the involvement of the other.
Alternatively, CityLink might be viewed (by someone interested in energy consumption) as providing a service that requires drivers to consume a given quantity of fossil fuels in order to realise a given quantity of system services. The energy analyst might ask questions about how much more efficient cars can become in their energy consumption, as a way of evaluating CityLink's vulnerability to a fossil fuel scarcity. A union organizer, on the other hand, might be more interested in how many staff they employ, and financial flows through the organization - they might look at all income to the company, and expenditures - and then investigate the distribution of the wage bill amongst different levels of management. They might use this representation to point out that more money can be paid to front line staff by reducing the size of management, or the size of management salaries.
These representations of the same system are constructed for different purposes and are very different as a result - but are equally valid. So, systems can have multiple and equally valid system models - the appropriate choice of system model will depend on one's personal position and the subject or system behavior of interest. The basis for a system model will generally start from some limited resource used by the system - energy, money and the workforce are some of the many possibilities for a toll road company.
What systems do we have that demonstrate efficiency?
And now it's time for this blog to start getting real! It's all very well to set forth a beautiful theory, but pointless if I can't relate it to the real world, and use it to understand real systems better. The big idea, so far, has been that of efficient systems being brittle. So, it's time to go system hunting! Can we find examples of systems that demonstrate some of these ideas?
The first one that comes to mind is the airlines. Airlines are exposed to the following risks:
* New technology without a proven history of reliability
* Large numbers of assets costing in the order of one hundred million dollars each
* Organizational complexity
* A large and highly skilled workforce
* Complex logistical operations
* Significant exposure to energy prices
* Long lead times on fleet planning
* Complex asset maintenance requirements
* Different legislative requirements for each country of operation
* Intense competition with other airlines for passengers
Taking Qantas as an example, the airline has a fleet of 135 aircraft as of February 2011, including 9 Airbus A380s and 38 Boeing 737-800s (most numerous aircraft type). The Airbus A380 list price is US$375.3 million, with the Boeing 737-800 list price being US$80.8 million. The replacement cost of these 47 aircraft - approximately one third of the Qantas fleet - would be approximately six and a half billion dollars. According to the Qantas Data Book 2010, the total assets of the Qantas Group in 2010 were stated as AU$19.9 billion, against annual revenue of AU$13.8 billion. Staff and fuel bills (AU$3.4 and AU$3.3 billion respectively) each made up approximately one quarter of the operating costs of the airline - but profit (after tax) for the year was only AU$116 million, less than 1% of revenue, and less than 1/3 the cost of a single A380! A blowout of just 3.5% in either the wage or fuel bill would be enough to wipe out the year's profit. For this reason Qantas engages in complex fuel bill and exchange rate hedging strategies to try and protect their profit margins against fluctuations in the exchange rate and fuel costs - but these strategies can only provide partial protection in a high risk environment. Hedging strategies don't protect Qantas against the drop in air travel that would result from a sustained increase in oil prices, which would erode the financial capacity of the public to spend money on air travel by increasing the costs of many other goods and services.
In order to remain profitable, airlines need to be constantly pursuing efficiency improvements in fuel usage, staff efficiency, aircraft costs, seat occupancy rates and so on, while also keeping prices low enough to compete with other airlines for market share. During the 2009/10 financial year, 82.5% of seats on Qantas Group aircraft were revenue generating, meaning that they were occupied by paying passengers. Revenue generating seat percentages are kept as high as possible through the practice of overbookings and constant adjustment of flight schedules - I suspect it has become increasingly common for flights to be cancelled, when it is possible to accommodate all affected passengers with spare capacity on other flights. Likewise, Qantas's move to the A380 has been driven by the increased efficiency of the aircraft - it burns approximately 10% less fuel per passenger than the 747. However, this is offset against the significant risk involved in any shift to a new aircraft and new technology, and also against the requirement to fill each aircraft with a much larger number of paying passengers in order to realize the potential efficiency gains.
The much-publicised Rolls Royce Trent 900 engine failure on Qantas flight 32 from Singapore on 4 November 2010 also provides an informative insight into the risks posed by new aircraft and new engines. The incident, which had the potential to cause the loss of the aircraft, led to the grounding of all six of Qantas's A380s for 23 days while the cause of the engine failure was investigated. The event exposed Qantas to significant financial losses, which had the potential to wipe out their profit margin. Qantas subsequently filed a statement of claim against Rolls Royce for financial losses due to the engine failure, which were estimated to be around $60 million in costs and lost revenue.
Given that this blog has an emphasis on systems, and both aircraft and jet engines are complex systems in their own right, the failure of the number 2 engine on Flight QF32 is worth closer examination, as is the contractual relationship between Qantas and Rolls Royce. It used to be that jet engines from engine manufacturers were bought as part of the aircraft, and owned by the airline or the aircraft leasing company - but it is now now a common arrangement for the airline to rent the engines from the engine manufacturer, paying a rental rate based on engine usage. In effect, this means that the relationship between the airline and engine manufacturer has changed from one of engine purchaser and engine retailer to one of propulsion service user and propulsion service provider. The benefit of this approach for the airline is that they can reduce their financial risk as they no longer need to buy engines outright, and can pay based strictly on usage, which helps control costs when there is a drop in air travel. On the other side, the engine manufacturer has a regular income stream, but now carries the risk of a downturn in air travel reducing engine usage and therefore their rental income.
There are two engine manufacturers making engines for the A380 - Rolls Royce, and Engine Alliance (a joint venture between General Electric and Pratt & Whitney). The airlines using the A380 prefer to have at least two manufacturers making engines for the 380, as competition for market share amongst engine manufacturers helps to keep engine prices low. If there were only one manufacturer and they were abusing their dominant market position, it could take in the order of five years for an alternative engine to be designed and built by another manufacturer (the Trent 900 took 8 years to design and build), which would impose significant costs and losses on the airlines and the aircraft manufacturer. However, there is still significant competition between the two engine manufacturers for market share - given the very high fixed costs of an engine development program, a small increase in market share can correspond to a significant increase in profits. Consequently, there is great financial pressure for the manufacturers to produce the lightest, most efficient, most reliable and most powerful engines possible, with the result that there is considerable pressure to push engine design to the absolute limit of safety.
At this point it is worth diverting briefly and explaining the basic function of a jet engine, so as to better describe the current state of the art of modern jet engine design. The basic principle is that air flows into the front end of the engine, where the low pressure compressor blades (the large, prominent blades visible from the front of the engine) are are located. The purpose of the low pressure compressor is to begin the first stage of compressing air flowing into the engine, so that when it reaches the combustion chamber, it is pressurized. Jet fuel is injected into this airstream in the combustion chamber, where it burns. For physics reasons to do with the air velocities involved, the resulting hot mixture of air and burnt fuel flows out the back of the engine, rather than out the front. It is at very high temperature (since a higher engine combustion temperature corresponds to increased engine efficiency) and very high velocity. This high velocity is what generates the engine thrust. As it flows out the back, it also flows over turbine blades which are mounted on shafts that run through the engine and are connected to the compressor blades in the engine intake. The hot gas flowing over the turbine blades makes them turn and drives the compressor - so the engine performs the neat trick of both generating thrust and also the power required to keep the jet generation process working.
The power of a modern jet engine is extraordinary - this YouTube video shows what a jet engine at full power can do to a light truck in the wrong place! The Trent 972B (used on Qantas' A380s) generates over 36 tons of static thrust at full power, which comes from throwing large quantities of air backwards at very high speed. There are two essential aspects to making a jet engine as efficient and powerful as possible - one is to burn the fuel as hot as possible, the other is to lose as little kinetic energy from the airstream as possible as it flows over the turbine blades, so as to maximise engine thrust. Even though the turbine blades are made of proprietary titanium-nickel-aluminum alloys which are super strong, the temperature of the hot gas is greater than the melting temperature of the alloy! Obviously, without some means of managing this problem, the turbine blades aren't going to live very long when the engine is running. The solution is to have a network of fine holes inside the turbine blade itself, which bleed cooling air from elsewhere in the engine over the surface of the turbine blade, creating a thin cushion of air that insulates it from the hot gas. Even then, this still isn't enough to permit the alloy blades to survive for long in this high pressure and high force environment - the blades also need to be grown as a single crystal, to eliminate inter-crystal boundary weaknesses from the metal! The fact that turbine blade technology has been pushed to this extraordinary extent demonstrates the limits to which the engineering and metallurgy has been pushed in order to make efficient, high power jet engines feasible. It also hints at the narrowness of the dividing line between an engine which is operating normally, and one which fails - because a modern jet engine is so efficient and has so many parts functioning near to the absolute limits of their structural capacities, the failure of just about any component will cause the failure of the engine.
During certification, a Trent 900 was subjected to a test in which the engine had an explosives package attached to the root of one of the compressor blades, and was run at full power. The explosives package was then detonated to simulate a bird strike. This is video of the test. The purpose of this test was to provide assurance that a bird strike would not result in components being ejected from the engine casing, and threatening the rest of the aircraft. In this case, the engine passed. However, expensive engine tests like this can only be justified for reasonable scenarios that might be expected to occur in use - it is not possible to test every single possible risk scenario. A bird strike is obviously a highly likely scenario, so this was tested for.
On QF 32, the initial cause of the failure was the failure of an oil supply pipe leading to a high pressure bearing within the turbine. A connection had been drilled slightly off centre during the manufacturing process, so that the wall of the connection was too thin to resist fatigue cracking. When fatigue fracture of the oil pipe occurred, presumably some time after QF32's takeoff from Singapore, oil flowed out of the failed connection into places in the engine where it didn't belong - where it burnt and applied extra heat to components that were subsequently forced beyond their material limits, and failed. The subsequent holes in the engine casing, wing, wing flaps, wing spar and fuselage of the A380 were all created by bits of disintegrating turbine being flung out from the engine at extremely high speed. These photos show the extent of the damage to the aircraft.
It appears that the risk of turbine blade failure was intended to be controlled by proper engine design and manufacture so as to prevent such a scenario occurring, since the engine casing on QF32 was clearly unable to retain the bits of disintegrating turbine. The misdrilled pipe connection, and subsequent oil leak, was all it took to push this modern jet engine, representing the absolute pinnacle of mechanical and materials engineering design, outside a safe operating condition - with spectacular results that came near to causing the loss of QF 32.
To illustrate another example of airline risk, the April 2010 eruption of Eyjafjallajökull forced the grounding of many flights through European airspace, due to the threat posed to jet engines by airborne ash. These flight restrictions were estimated to be costing airlines approximately US$400 million per day - unlike the Trent 900 failure on QF 32, affected airlines were unable to recover costs through legal action. Likewise, the grounding of all commercial aircraft in the United States for several days after 11 September 2001, and subsequent longer term changes in travel patterns, imposed significant financial losses on many domestic US airlines. These exacerbated existing financial difficulties, and pushed many of them closer to bankruptcy.
In order to reduce costs as far as possible, airlines make assumptions about their future operating environment, and invest in aircraft on the basis of those assumptions. Getting these assumptions wrong can and does lead to the failure of an airline. A contributing factor in the failure of Ansett in March 2002, apart from a large wage bill, was that they were flying too many different types of aircraft, which imposed higher maintenance costs than other airlines with less diverse fleets.
The tight financial margins of the airline industry are beautifully illustrated by a (possibly acrophycal) quip on how to become a millionaire, attributed to either Richard Branson and Warren Buffet - "Become a billionaire, then buy an airline".
This discussion shows that the airlines are exposed to an extraordinary array of business, travel pattern, financial, energy cost, asset maintenance and technological risks, and that modern jet engines operate near to the absolute limits of what is possible, due to the constant quest for greater efficiency. It doesn't take much at all to push an airline or an aircraft into failure - they are truly "brittle" systems, in the sense discussed in the introductory post.