Wednesday, March 16, 2011

Three Mile Island

This one of the classic human error examples. With the problems in Japan with the Fukushima Daiichi following the earth quake and tsunami, the BBC has given a good summary of what happened at 4am on Wednesday 28 March, 1979.

A relatively routine malfunction in a non-nuclear system caused a relief valve to open, releasing coolant from the core. The valve should have closed after a moment, but it didn't, and a large volume of coolant escaped.

There was no straightforward way for the plant's operators to know that the valve was the problem. No instrument on their control panel indicated whether it was open or closed.

Operators knew something was going wrong, though - alarms sounded and lights were flashing.

They mistakenly diagnosed the issue as being too much coolant in the pressuriser and shut off the emergency core cooling system, the first in a series of missteps that escalated the crisis.

"In not knowing what was going wrong and taking exactly the wrong action, they exacerbated the problem by orders of magnitude," says J Samuel Walker, a historian who worked for many years for the Nuclear Regulatory Commission (NRC), the US atomic agency and nuclear watchdog.

Operators worked furiously for days to minimize the meltdown.

It wasn't until 1985, when sophisticated cameras were sent into the core, that authorities understood the enormous extent of the meltdown.

The TMI disaster took over 12 years to clean up, at a cost of about $973m (£605m).

Fortunately, little radiation was released, and multiple studies have shown no serious health impacts.

There was no documented increase in cancers. Links between TMI and problems with livestock in the area, including deaths and reproductive issues, have not been proven.

Monday, March 14, 2011

Wales earned rub of the green

This is South Wales 14 March 2011

Referring to the Six Nations rugby match between Wales and Ireland played in Cardiff on 12 March.

Wales were awarded a try following a quick throw in that used a ball different to the one that went out of play.

"A quick throw in is not permitted if another person has touched the ball apart from the player throwing it in and an opponent who carried it into touch."

Referee Jonathan Kaplan consulted Allan before awarding the score, asking him: "Was it the correct ball?"

Allan replied in the affirmative, despite the fresh ball having been thrown to Rees by a ball-boy, meaning there were not one but two breaches of the IRB's rule.

There are bound to be questions asked about how an official on the international circuit can make such a blunder.

The more charitable will accept it as human error.

But some will want to see Allan punished, perhaps confined to Scottish domestic rugby for ever and a day — Hawick v Melrose, anyone?

Friday, March 11, 2011

Bayer CropScience Pesticide Waste Tank Explosion

Details of the investigation carried out by US Chemical Safety Board (CSB)of the Chemical runaway reaction leading to pressure vessel explosion on 28 August 2008 that kills two and injured eight. Available include:

* Final report
* Video on CSB website and on YouTube

The incident occurred during the restart of the methomyl unit after an extended outage to upgrade the control system and replace the original residue treater vessel. The Chemical Safety Board (CSB) investigation highlighted the following issues:

* Deviation from the written start-up procedures,
* Bypassing critical safety devices intended to prevent such a condition.
* Inadequate pre-startup safety review;
* Inadequate operator training on the newly installed control system;
* Unevaluated temporary changes,
* Malfunctioning or missing equipment, misaligned valves, and bypassed critical safety devices;
* Insufficient technical expertise available in the control room during the restart.
* Poor communications during the emergency between the Bayer CropScience incident command and the local emergency response agency confused emergency response organizations and delayed public announcements on actions that should be taken to minimize exposure risk

Pre-Startup Activities

Unlike the normal methomyl restart after a routine shutdown, the August restart involved operations personnel, engineering staff, and contractors working around the clock to complete the control system upgrade and residue treater replacement. Work included finalizing the software upgrades, modifying the work station, calibrating instruments, and checking critical components. Board operators were provided time at the methomyl work station so that they could familiarize themselves with the new control functions, equipment and instrument displays, alarms, and other system features. Other personnel were completing the residue treater replacement, reinstalling piping and components, and reconnecting the control and instrument wiring. These activities progressed in parallel with the ongoing Larvin unit operation.

The methomyl control system upgrade required a revision to the SOP to incorporate the changes needed to operate the methomyl unit with the new Siemens system, and to reformat the SOP to a computerized document. However, at the time of the incident the SOP revision remained incomplete; the operators were using an unapproved SOP that did not contain the new control system operating details.

Solvent Flush and Equipment Conditioning
Many of the subsystems in the methomyl unit required a solvent flush and nitrogen gas purge to clean and dry the systems before startup. These activities were critical to safely start the residue treater system as the feed, recirculation, and vent piping had been disconnected and a new pressure vessel had been installed. The solvent-only run was also needed to verify instrument calibrations, proper equipment operating sequences, and other operating parameters in the new DCS.
The staff flushed the process equipment with solvent to remove contaminants and water that might have gotten into the system during the outage. However, contrary to the SOP 25 the staff did not perform the residue treater solvent run.26 Operators reported that solvent flow restrictions upstream impeded completion of instrument calibrations because the proper adjustments could not be made at low flow rates. Even had the staff not needed to verify the control system function and operability, the solvent run was required to pre-fill the residue treater to the minimum operating level and to heat the liquid to the minimum operating temperature before adding the methomyl containing flasher bottoms feed.

2.2 Unit Restart
Although the operations staff acknowledged that management had not prescribed a specific deadline for resuming methomyl production, onsite stockpiles of methomyl necessary to make Larvin were dwindling. Unit personnel recognized the important role of methomyl in the business performance of the facility, and a recent increase in worldwide demand for Larvin created a significant, sustained production schedule. Methomyl-Larvin operating staff told CSB investigators that they looked
forward to resuming methomyl production and a return to the normal daily work routine after the long unit shutdown.
Operator logs documented the plan to start the MSAO (a.k.a. Oxime) unit Monday morning, August 25. Methomyl synthesis needed to begin shortly thereafter. However, critical startup activities were not completed, and the staff struggled with many problems as they attempted to bring each subsystem on line. To complicate the startup problems, process computer system engineers had not verified the
functionality of all process controls and instruments in the new control system.

Control System Upgrade
The introduction of the Siemens PCS7 control system significantly changed the interactions between the board operators and the DCS interface. The Siemens control system contained features intended to minimize human error such as graphical display screens that simulated process flow and automated icons to display process variables. But the increased complexities of the new operating system challenged operators as they worked to familiarize themselves with the system and units of measurement for process variables differed from those in the previously used Honeywell system.

Human interactions with computers are physical, visual, and cognitive. New visual displays and modified command entry methods, such as changing from a keyboard to a mouse, can influence the usability of the human-computer interface and impair human performance when training is inadequate. Operators told CSB investigators they were concerned with the slower command response times in the Siemens system and they talked about the methomyl process control issues they would face during the restart, which was much more difficult to control than the Larvin process. Board operators also told CSB investigators that the detailed process equipment displays in the DCS were difficult to navigate. Routine activities like starting a reaction or troubleshooting alarms would require operators to move between multiple screens to complete a task, which degraded operator awareness and response times.

The old system display and command entry was basically a spreadsheet, or line-item display. The new system used a graphical user interface (GUI) that displayed an illustrative likeness of the process and its various components (Figure 18). The board operator selected the device that needed to be changed. This made data entry clearer, but much slower. In the old system, board operators could change multiple process variables simultaneously, but they could select and change only one variable at a time in the Siemens system.

The new control system also changed how board operators monitored multiple pieces of equipment. The methomyl board operators’ station had five display screens available to monitor the methomyl processes and one display screen dedicated to process alarms. However, operating some methomyl equipment required the operators to use at least three of the five display screens. To simplify the operation, they asked the Siemens project engineers to add equipment overview screens to display multiple pieces of equipment. The board operators believed that the overview screens would provide more effective control of the unit; however, the screens were not available for the August 2008 startup.

Deadly Practices

Video from US Chemical Safety Board (CSB). Also available on YouTube.

It shows several accidents where vented natural gas has caused fires and explosions. They include the Kleen Energy power plant and a Hilton Hotel.

Thursday, March 10, 2011

What organisations can learn from bees

British Airways in-flight 'Business:life' magazine January 2011

Management consultant Michael O'Malley lists five ways that the way a bee hive works can teach us lessons for organisational success:

1. Pursue common goals. Bees have reproductive success and survival as their overarching goals and deal with instances where individuals attempt to pursue their own goals that are against the interests of the group, such as when worker bees lay eggs when only the queen has that right.

2. Protect the future. Bees proffer consistent and gradual gain and avoid boom and bust. Even if they find an exceptionally good patch of flowers, they don't all rush to harvest from that location. Instead, they always have scout bees looking for new opportunities, and when times get hard they invest more resources in scouting.

3. Distribute authority. Although the queen has a prominent role, this is mainly to keep the colony calm. Most decisions are made by the workers themselves, with the most important being made by those closest to the action that have access to the best information.

4. Safeguard against catastrophic loss. Bees achieve this in three ways. First, they maintain diversity (genetically), which means they do not all act the same way and so the colony is more able to deal with all situations it encounters. Second, they work flexibly so that different bees can step in to perform essential functions if those originally in that role are lost. Third, bees know they make mistakes but they choose to make errors that are least likely to have major consequences.

5. Avoid self-inflicted death. Bees need to keen the colony clean, and so expel those that may jeopardise this (i.e. have a contagious infection). This is like writing a reference for people who do not fit into an organisation, rather than keeping them to inflict damage.

Bees are a good example of an environmentally friendly organisation. They perform an essential role through pollination. They never take all the pollen or nectar from a flower, because that increases regeneration time.

Friday, March 04, 2011

Buncefield: Why did it happen?

Report published February 2011 and available at the HSE website.

Report sub heading "The underlying causes of the explosion and fire at the Buncefield oil storage depot, Hemel Hempstead, Hertfordshire on 11 December 2005"

On the night of Saturday 10 December 2005, Tank 912 at the Hertfordshire Oil Storage Limited (HOSL) part of the Buncefield oil storage depot was filling with petrol. The tank had two forms of level control: a gauge that enabled the employees to monitor the filling operation; and an independent high-level switch (IHLS) which was meant to close down operations automatically if the tank was overfilled. The first gauge stuck and the IHLS was inoperable – there was therefore no means to alert the control room staff that the tank was filling to dangerous levels. Eventually large quantities of petrol overflowed from the top of the tank. A vapour cloud formed which ignited causing a massive explosion and a fire that lasted five days.

The gauge had stuck intermittently after the tank had been serviced in August 2005. However, neither site management nor the contractors who maintained the systems responded effectively to its obvious unreliability. The IHLS needed a padlock to retain its check lever in a working position. However, the switch supplier did not communicate this critical point to the installer and maintenance contractor or the site operator. Because of this lack of understanding, the padlock was not fitted.
Having failed to contain the petrol, there was reliance on a bund retaining wall around the tank (secondary containment) and a system of drains and catchment areas (tertiary containment) to ensure that liquids could not be released to the environment. Both forms of containment failed. Pollutants from fuel and firefighting liquids leaked from the bund, flowed off site and entered the groundwater. These containment systems were inadequately designed and maintained.
Failures of design and maintenance in both overfill protection systems and liquid containment systems were the technical causes of the initial explosion and the seepage of pollutants to the environment in its aftermath. However, underlying these immediate failings lay root causes based in broader management failings:

Management systems in place at HOSL relating to tank filling were both deficient and not properly followed, despite the fact that the systems were independently audited.

Pressures on staff had been increasing before the incident. The site was fed by three pipelines, two of which control room staff had little control over in terms of flow rates and timing of receipt. This meant that staff did not have sufficient information easily available to them to manage precisely the storage of incoming fuel.

Throughput had increased at the site. This put more pressure on site management and staff and further degraded their ability to monitor the receipt and storage of fuel. The pressure on staff was made worse by a lack of engineering support from Head Office.
Cumulatively, these pressures created a culture where keeping the process operating was the primary focus and process safety did not get the attention, resources or priority that it required.

This report does not identify any new learning about major accident prevention. Rather it serves to reinforce some important process safety management principles that have been known for some time:
There should be a clear understanding of major accident risks and the safety critical equipment and systems designed to control them.
This understanding should exist within organisations from the senior management down to the shop floor, and it needs to exist between all organisations involved in supplying, installing, maintaining and operating these controls.
There should be systems and a culture in place to detect signals of failure in safety critical equipment and to respond to them quickly and effectively.
In this case, there were clear signs that the equipment was not fit for purpose but no one questioned why, or what should be done about it other than ensure a series of temporary fixes.
Time and resources for process safety should be made available.
The pressures on staff and managers should be understood and managed so that they have the capacity to apply procedures and systems essential for safe operation.
Once all the above are in place:
There should be effective auditing systems in place which test the quality of management systems and ensure that these systems are actually being used on the ground and are effective.
At the core of managing a major hazard business should be clear and positive process safety leadership with board-level involvement and competence to ensure that major hazard risks are being properly managed.

Why designers should pay more attention to ergonomic issues

Article in Engineer Live, March 2011. Interview with Gary Davis from Davis Associates

Gary reports a number of reasons why 'forward thinking' companies are taking ergonomics seriously. They include:

* Changing demographic; by the year 2020 half the adults in the UK will be aged 50 or over and the number of older people in the world will double to 1.2 billion by 2028. Inclusive design, in which products and services are usable by the widest possible range of people, therefore provides access to an expanding market.

* Usability is now considered to be a minimum requirement and the requirement is to make a product more pleasurable to use and to give it the 'wow!' factor. Ergonomics is becoming recognised as something that can create a commercial advantage

* Pressure from legislators. For example, ISO11064 Ergonomic design of control centres, Equal Treatment Framework Directive 2000/78/EC) and new EU Machinery Directive 2006/42/EC places a much greater emphasis on ergonomics than its predecessor).

* Ergonomics is a good way to protect a brand or strengthen it. If your products gain a reputation for being user-friendly, this can become important in markets where product differentiation is otherwise difficult."

* Conversely, the relatively new phenomenon of websites providing consumers with the opportunity to review products can damage brands if products are found to be unsatisfactory.

* If you consider ergonomics from the outset, you are less likely to have to make last-minute design changes

* If products are intuitive and comfortable to use, this reduces the need to provide after-sales support, plus it helps to minimise the number of product returns and warranty claims."

* Applying ergonomics to operator workstations can improve safety, reduce the opportunities for errors to be made, and raise productivity.

Quantifying the benefits of improved ergonomics can be difficult, but one area where this is done routinely is in website design, particularly for sales-orientated sites. Indeed, a specialist industry has evolved to improve website usability - which can be translated very rapidly into increased sales and, ultimately, profits that far outweigh the costs of the usability consultancy.

Elsewhere, of course, investing in ergonomics can reap rewards in the early stages of a project and at other points too - such as when alternative concepts are being assessed, and when pre-production units are available for user trials. Almost any level of ergonomics input can benefit a design, resulting in improved consumer satisfaction, appeal to a wider range of users, enhanced safety and so on.

Thursday, March 03, 2011

July 7/7 - Management jargon

Interesting comments made by Lady Justice Hallett (the coroner), reported on the BBC website.

"A succession of senior figures from across the capital's emergency services have appeared before Lady Justice Hallett - but on the final day of the inquests, she told them to use plain English, rather than refer to things like the "Conference Demountable Unit from the Management Resource Unit".

"Management jargon is taking over organisations," said Lady Justice Hallett. "When it comes to something like a major incident, people do not understand what the other person is.

"All you senior people from these organisations are allowing yourselves to be taken over by management jargon... You people at the top need to say 'We have to communicate with other people and we communicate with plain English'."

"I am sorry if that sounded like a rant but everybody who has been here for the last few months will know I have been building up to it."