The launch ended in failure due to multiple errors in the software design: dead code, intended only for Ariane 4, with inadequate protection against integer overflow led to an exception handled inappropriately, halting the whole otherwise unaffected inertial navigation system. This caused the rocket to veer off its flight path 37 seconds after launch, beginning to disintegrate under high aerodynamic forces, and finally self-destructing via its automated flight termination system. The failure has become known as one of the most infamous and expensive software bugs in history.[2] The failure resulted in a loss of more than US$370 million.[3]
Launch failure
The Ariane 5 reused the code from the inertial reference platform from the Ariane 4, but the early part of the Ariane 5's flight path differed from the Ariane 4 in having higher horizontal velocity values. This caused an internal value BH (Horizontal Bias) calculated in the alignment function to be unexpectedly high. The alignment function was operative for approximately 40 seconds of flight, which was based on a requirement of Ariane 4, but served no purpose after lift-off on the Ariane 5.[4] The greater values of BH caused a data conversion from a 64-bitfloating point number to a 16-bitsignedinteger value to overflow and cause a hardware exception.[5] The programmers had protected only four out of seven critical variables against overflow to keep within a required maximum workload target of 80% for the on-board Inertial Reference System computer, and relied on assumptions which were correct for the trajectory of Ariane 4, but not Ariane 5, regarding the possible range of values for the three unprotected variables.[6] The exception halted both of the inertial reference system modules, although they were intended to be redundant. The active module presented a diagnostic bit pattern to the On-Board Computer which was interpreted as flight data, in particular causing full nozzle deflections of the solid boosters and the Vulcain main engine. This led to an angle of attack of more than 20 degrees, causing separation of the boosters from the main stage, the triggering of the self-destruct system of the launcher, and the destruction of the flight.[4]
The official report on the crash (conducted by an inquiry board headed by Jacques-Louis Lions) noted that "An underlying theme in the development of Ariane 5 is the bias towards the mitigation of random failure. The supplier of the inertial navigation system (SRI) was only following the specification given to it, which stipulated that in the event of any detected exception the processor was to be stopped. The exception which occurred was not due to random failure but a design error. The exception was detected, but inappropriately handled because the view had been taken that software should be considered correct until it is shown to be at fault. [...] Although the failure was due to a systematic software design error, mechanisms can be introduced to mitigate this type of problem. For example the computers within the SRIs could have continued to provide their best estimates of the required attitude information. There is reason for concern that a software exception should be allowed, or even required, to cause a processor to halt while handling mission-critical equipment. Indeed, the loss of a proper software function is hazardous because the same software runs in both SRI units. In the case of Ariane 501, this resulted in the switch-off of two still healthy critical units of equipment."[4]
Other issues identified in the report focused on testing:[4]
The purpose of the review process, which involves all major partners in the Ariane 5 programme, is to validate design decisions and to obtain flight qualification. In this process, the limitations of the alignment software were not fully analysed and the possible implications of allowing it to continue to function during flight were not realised.
The specification of the inertial reference system and the tests performed at equipment level did not specifically include the Ariane 5 trajectory data. Consequently, the realignment function was not tested under simulated Ariane 5 flight conditions, and the design error was not discovered.
It would have been technically feasible to include almost the entire inertial reference system in the overall system simulations which were performed. For a number of reasons it was decided to use the simulated output of the inertial reference system, not the real system or its detailed simulation. Had the system been included, the failure could have been detected. Post-flight simulations have been carried out on a computer with software of the inertial reference system and with a simulated environment, including the actual trajectory data from the Ariane 501 flight. These simulations have faithfully reproduced the chain of events leading to the failure of the inertial reference systems.
Another perspective of the failure, based on systems engineering, focuses on requirements:[7]
The ranges of variables such as horizontal velocity and the quantity BH computed from it should have been explicitly quantified. Instead, a 16-bit range was assumed.
The alignment task should have been deactivated at an appropriate moment. Instead, the alignment task was running after lift-off.
A failure model of the inertial reference platforms should have been analyzed to ensure that service would be continuously delivered throughout the flight, rather than assuming that at most one module would fail. Instead, both modules failed, and rather than killing the flight gracefully, output diagnostic messages were interpreted as flight data.
Payload
Cluster consisted of four 1,200 kilograms (2,600 lb) cylindrical, spin-stabilised spacecraft, powered by 224 watt solar cells. The spacecraft were to have flown in a tetrahedral formation, and were intended to conduct research into the Earth's magnetosphere. The satellites would have been placed into highly elliptical orbits; 17,200 by 120,600 kilometres (10,700 by 74,900 mi), inclined at 90 degrees to the equator.[8]
Aftermath
Following the failure, four replacement Cluster II satellites were built. These were launched in pairs aboard Soyuz-U/Fregat rockets in 2000.
The launch failure brought the high risks associated with complex computing systems to the attention of the general public, politicians, and executives, resulting in increased support for research on ensuring the reliability of safety-critical systems. The subsequent automated analysis of the Ariane code (written in Ada) was the first example of large-scale static code analysis by abstract interpretation.[9]
The failure also harmed the excellent success record of the European Space Agency's rocket family, set by the high success rate of the Ariane 4 model. It was not until 2007 that Ariane 5 launches were recognised as being as reliable as those of the predecessor model.[10]
See also
Mars Climate Orbiter software that had been adapted from an earlier Mars Climate Orbiter was not adequately tested before launch
^Le Lann, Gérard (March 1997). "An Analysis of the Ariane 5 Flight 501 Failure – A System Engineering Perspective". Proceedings of the 1997 international conference on Engineering of computer-based systems (ECBS'97). IEEE Computer Society. pp. 339–346. doi:10.1109/ECBS.1997.581900. ISBN0-8186-7889-5.
Wired – History's Worst Software Bugs — An article about the top 10 software bugs. The Ariane 5 Flight 501 software glitch is mentioned as one of these bugs.
(in German)Ariane 5 – 501 (1–3) — A good article (in German) where the actual code in question is given.