Tuesday, September 18, 2007

Classifying causes

I have had a comment on my last post regarding the Virgin train crash. It questions whether there are problems with accident investigation because we have not properly defined the terminology used to describe different types of cause (e.g. "immediate" and "underlying").

My view is that these classifications of cause do not give us the full range needed to fully explain an accident. The reality is that there are many different types of failure that can contribute to an accident, and each of these failures may have multiple causes. Most accidents will start with a combination of technical, human and organisational failures that create a hazardous situation. This already highlights the complexity. For example, a human error can be an "immediate" cause of an accident, but it can also cause a technical failure, in which case the human error would be an "underlying" cause.

This is further complicated by the fact that a hazardous situation does not necessarily result in an accident. If the situation has been predicted defences can be put in place. Only if these fail do you have a developing incident. Even then there are opportunities to recover the situation. Failure to recover results in an accident, whilst successful recovery means it is a near miss.

If I look at an accident I tend to start by thinking "what failures resulted in a hazardous situation developing, were there potential defences and did they fail, and could the situation have been recovered?" This gives me a set of failures (you may call them "immediate" causes) that require analysis. These can then be broken down, for example using "why trees" until the root causes are found. In this case the root causes are where the "why tree" cannot be broken down any further.

My view of failure types comes from a model I first saw in the PhD thesis of Tjerk van Der Schaaf, who is now a professor at Eidhoven University. You can see the model reproduced in another thesis (see figure 1.1 on page 6).

Andy Brazier

Friday, September 14, 2007

Virgin Rail crash, February 2007

A summary of Network Rail's investigation into the Virgin rail crash that killed one person was released 4 September 2007. It is available from here

The report uses a lot of railway jargon that I am not familiar with.

The conclusions identify the immediate cause as the deterioration of components in the stretcher bar system on the points. Underlying cause was a failure to carry out an inspection that would have identified the fault.
* Deficiencies in the asset inspection and maintenance regime employed on Lancs & Cumbria maintenance area resulted in the deterioration of 2B points not being identified. These deficiencies included:
* A breakdown in the local management/supervisory structure that leads, monitors and regulates asset inspection and maintenance activities;
* A systematic failure in the track patrolling regime employed on the local area;
* The issue and subsequent briefing of mandated standards not being carried out in a robust and auditable manner;
* A lack of sample verification to test the quality and arrangements for inspections undertaken.

I find this quite bizarre. Failure to inspect something does not cause it to fail. Yes, it may allow a hazard to be discovered before an accident occurs, but that is not the same thing. It sounds to me like Network rail are trying to distract us from more fundamental problems with the design of points. Especially given the fact we still do not know what caused the Potter's Bar train crash, which also involved a failure of points.

In fact, reading more of the report into this crash it seems design issues were raised, and most of the action items are focussed on these types of issue. This makes it even more strange in my opinion that the conclusions in the report (which are probably all that most people will read) are so focussed on inspection.