“Only 62 minutes”: From security provider to security problem

“Your company can be ruined in just 62 minutes”: This is how the security provider Crowdstrike advertises. Now the US manufacturer has itself caused an estimated multi-billion-dollar loss due to a faulty product update – at breakneck speed.

On 19 July at 04:09 (UTC), the security specialist CrowdStrike distributed a driver update for its Falcon software for Windows PCs and servers. Just 159 minutes later, at 06:48 UTC, Google Compute Engine reported the problem, which “only” affected certain Windows computers and servers running CrowdStrike Falcon software.

Almost five per cent of global air traffic was unceremoniously paralysed as a result, and 5,000 flights had to be cancelled. Supermarkets from Germany to New Zealand had to close because the checkout systems failed. A third of all Japanese MacDonalds branches closed their doors at short notice. Among the US authorities affected were the Department of Homeland Security, NASA, the Federal Trade Commission, the National Nuclear Security Administration and the Department of Justice. In the UK, even most doctors’ surgeries were affected.

The problem

The incident points to a burning problem: the centralisation of services and the increasing networking of the IT systems behind them makes us vulnerable. If one service provider in the digital supply chain is affected, the entire chain can break, leading to large-scale outages. As a result, the Microsoft Azure cloud was also affected, with thousands of virtual servers unsuccessfully attempting to restart. Prominent people affected reacted quite clearly. Elon Musk, for example, wants to ban CloudStrike products from all his systems.

More alarming, however, is the fact that security software is being used in areas for which it is not intended. Although the manufacturer advertises quite drastically about the threat posed by third parties, it accepts no responsibility for the problems that its own products can cause and their consequential damage. CrowdStrike expressly advises against using the solutions in critical areas in its terms and conditions. It literally states – and in capital letters: “THE OFFERINGS AND CROWDSTRIKE TOOLS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT.”

The question of liability

Not suitable for critical infrastructures, but often used there: How can this happen? Negligent errors with major damage, but no liability on the part of the manufacturer: How can this be?

In the context of open source, it is often incorrectly argued that the question of liability in the event of malfunctions and risks is unresolved, even though most manufacturers who place open source on the market with their products do provide a warranty.

We can do a lot to make things better by tackling the problems caused by poor quality and dependence on individual large manufacturers. Of course, an open source supply chain is viewed critically, and that’s a good thing. But it has clear advantages over a proprietary supply chain. The incident is a striking example of this. It is easy to prevent an open source company from rolling out a scheduled update in which basic components simply do not work by using appropriate toolchains, and this is what happens.

The consequences

So what can we learn from this disaster and what are the next steps to take? Here are some suggestions:

  1. improve quality: The best lever to put pressure on manufacturers is to increase the motivation for quality via stricter liability. The Cyber Resilience Act (CRA) offers initial approaches here.
  2. Safety first: In this case, this rule relates primarily to the technical approach to product development. Deeply intervening in customer systems is controversial in terms of security. Many customers reject this, but those affected obviously do not (yet). They have now suffered the damage. There are alternatives, which are also based on open source.
  3. use software only as intended: If a manufacturer advises against use in a critical environment, then this is not just a phrase in the general terms and conditions, but a reason for exclusion.
  4. centralisation with a sense of proportion: There are advantages and disadvantages to centralising the digital supply chain that need to be weighed up against each other. When dependency meets a lack of trustworthiness, risks and damage arise. User authorities and companies then stand helplessly in the queue, without alternatives and without their own sovereignty.