news

Podcast: How CrowdStrike caused a global IT meltdown and what comes next

Harun Ozalp | Anadolu | Getty Images

The blue screen of death errors on computer screens are viewed due to the global communications outage caused by CrowdStrike, which provides cyber security services to US technology company Microsoft, on July 19, 2024 in Ankara, Turkey. 

  • Last week, an update from cybersecurity firm CrowdStrike caused the Windows operating system to crash in what was perhaps the largest IT failure in history.
  • Industries across the board were hit, with airlines cancelling flights, broadcasters not able to go on air and shops not being able to open.
  • In the latest episode of CNBC Tech's "Beyond the Valley" podcast, CNBC's Arjun Kharpal and Tom Chitty discuss the IT failure and whether it might happen again.

Last week, the world faced what was likely the biggest IT failure in history.

WATCH ANYTIME FOR FREE

Stream NBC10 Boston news for free, 24/7, wherever you are.

When some people around the world logged onto their laptops on Friday, they were greeted with a blue error screen on their Microsoft Windows operating system.

But this was not a Microsoft issue. It was all to do with a U.S. cybersecurity firm called CrowdStrike which sent out a buggy software update that crashed Windows.

Industries across the board were hit, with airlines cancelling flights, broadcasters not able to go to air and shops not being able to open.

Businesses around the world use CrowdStrike's software to protect their IT systems from hackers. The dependence on such a company exposed the fragility of global businesses' reliance on a small number of IT vendors.

CrowdStrike rolled back the update but it took some time for companies to get back online.

For me, Friday was a professional day unlike any other.

In the latest episode of CNBC Tech's "Beyond the Valley" podcast — which you can listen to above — Tom Chitty and I talk about what was behind the IT failure, how CNBC covered the event and whether something like this could happen again.

If you have any thoughts on this or previous episodes, please email us at beyondthevalley@cnbc.com.

You can subscribe to "Beyond the Valley" by clicking the links below to your chosen platform:

Apple Podcast

Google Podcasts

Spotify

Here is a transcript of the "Beyond the Valley" episode released on July 22, 2024. It has been edited for clarity and brevity. 

Tom Chitty 

Last week's IT failure may go down as the worst in history. Computers around the world began to grind to a halt on Thursday night, after a faulty software update with a single defect saw severe disruption to air travel, hospitals, banks, and much more. This week, we're going to explain how this happened and the subsequent fallout. We'll also hear Arjun's firsthand account of his day at CNBC's London offices, and how CNBC managed to get on air. And finally, what does this failure tell us about the vulnerable foundations on which today's economy is built? And how can we make it stronger? We had a quick chat on Friday, lunchtime last week, where you said, I think the words were, 'I've never known a day like it.' So just to give our listeners an idea of kind of, how did the day start?

Arjun Kharpal

As you mentioned, [it was] a day, unlike any I've ever experienced before in my career here as a journalist, and I was scheduled to talk about Netflix earnings. And what had happened was overnight, firstly, we had heard about issues with Microsoft's cloud service. But before we were scheduled to go on air, people's computers started to crash. And there were questions over whether we would get on to air or not. Anyways, it was all fine in the first hour of Squawk Box, our morning show and I spoke about Netflix, and after I come off air nearer to 7am London time, I noticed that my computer had crashed and others too. This was the first time mine had crashed, others' had. But it wasn't as widespread. Anyways, computers started then falling like dominoes at this point. And the blue screen of death, as it's called, was appearing. But at this point, we didn't really know what was causing it. And I was sort of frantically running around the office trying to figure out what had happened at this point.

Tom Chitty 

Because you're in that unique position where you're being affected by this problem. But you've also got to cover this problem as a journalist that covers tech. So you are the news.

Arjun Kharpal

How we broke this story, and we were actually one of the first in the world to break this story. And you're going to talk about it. But it came from an update issued by a company called CrowdStrike, which we're going to get into. But the way we found that out, interestingly, was the fact that because it was affecting us, our IT departments globally, had been speaking to CrowdStrike support. And, you know, we had got confirmation then from the company via our colleagues at NBC, that this was indeed the reason for this global IT meltdown. And you know, from that moment I think I ran a report on this news. On cnbc.com we got a headline out and the bare bones of information, which was, you know, very limited at the time we knew. And from then on, the story snowballed.

Tom Chitty 

We're going to talk more about who CrowdStrike are, I think, you know, some people would probably never have heard of CrowdStrike. Do you have a stat of the week?

Arjun Kharpal

A billion dollars.

Tom Chitty 

Okay, great. Just really, you know, specific. That works for me. I can do with that. Weird to just do it on my own considering the last few episodes, I've been competing against someone far more intelligent than myself. So let's see how I go. Okay, let's let's get into the bones of it. CrowdStrike. Who are CrowdStrike?

Arjun Kharpal

CrowdStrike are a U.S.-based cybersecurity company that sells cybersecurity software aimed at businesses.

Tom Chitty 

And their owner is a billionaire, George Kurtz, and the company is worth billions of dollars. So this isn't just some small firm. This is a firm that supports the security of Microsoft.

Arjun Kharpal

And the security of organizations across the world. That's the more important part of the equation, is that it's not a small firm. Lots and lots and lots of global businesses rely on CrowdStrike for their security. And that's why this whole episode was quite a big ordeal and why it was so widespread.

Tom Chitty 

There was also something that happened before the CrowdStrike thing, right, related to Microsoft Cloud.

Arjun Kharpal

So the timeline is quite important. I got into the office Friday morning, London time, overnight, our time so late U.S. Microsoft had issued an update saying that their Azure cloud services were facing some problems, and that there could be disruption to certain Microsoft cloud-based apps, you know, like Teams, for example.

Tom Chitty 

So is that what you were talking about when you had your first kind of update that you needed to do?

Arjun Kharpal

So that was the first thing. But what we found out later was that was completely wholly unrelated to what followed with CrowdStrike.

Tom Chitty 

That then probably confused countless IT managers, engineers, when this was happening, because they're probably thinking, oh, it must be related to the update we did earlier, right?

Arjun Kharpal

Yeah, that's right. There were questions over whether this was a Microsoft issue. I had first seen that and thought it was a Microsoft issue, the reason why Windows crashed on my PC. But then as we started to get more information, we found out it was to do with CrowdStrike. And their specific issue was this — CrowdStrike software called Falcon, this is what they call an endpoint monitoring product. So it's effectively a piece of software designed to protect what they call endpoints. It's a jargony term in the cybersecurity industry. It basically means your laptop, your PC, or smartphone.

What CrowdStrike did was, and this is normal for cybersecurity firms, is they have to issue updates very often because the cybersecurity landscape and the threats are constantly changing the different ways hackers might attempt to exploit vulnerabilities in machines, etc. So they have to update regularly, the patches and the defense against these kinds of new vectors of attack and therefore, a mundane update they issued, which was to roll out across their customers globally, is what they did.

If you've got a smartphone, your smartphone sometimes automatically updates overnight, right? Or it will automatically download apps. This is very similar to what happened.The fact that the company CrowdStrike has to constantly update its offering is also where the weakness stems from. And they issued an update that had buggy code in it, effectively defective code. This cybersecurity software are quite special. Because they're trying to protect an organization's entire infrastructure, they need deep access into the kernel, the core, the heart, effectively, of an organization's IT infrastructure. By doing so, if things go wrong, it can take down a system. So they've issued this buggy piece of update, and that's effectively crashed Windows.

So it wasn't a Windows or Microsoft issue. It was a CrowdStrike issue. And as a result, people began to see the so-called Blue Screen of Death which popped up on their PCs and laptops. And you would have seen an unhappy face on those error messages as well.

Tom Chitty 

So were they updating on their own without you having to do click on the update? Was it just happening in the background?

Arjun Kharpal

Yeah, effectively. But it wouldn't be something you know you as a PC user would have seen. You wouldn't have to click update now or something like that. That would have been at the IT department level. What a morning it was here at CNBC, I've never experienced anything like it. There were questions over whether we would get to air.

Tom Chitty

Well, a U.K. broadcaster Sky News didn't get to air, along with you know, countless others.

Arjun Kharpal

There were computers offline, you know, we were using our phones to get all of the information. So it was a crazy morning I was sat on set about 7:45, I think doing a hit, I can't even remember what it was, we didn't know a lot of information. So it was kind of like we had some reports, we had some information, this is what we know. But you know, and then all of a sudden, the producers at 7:45 a.m. said stay on set, don't go anywhere. And from that moment, I did not leave that set for about three and a half, four hours. I was constantly on air. Update after update, minute after minute, as new things came through. We heard of airlines systems not working, huge delays at airports across the world, people not being able to check into flights, various different industries, banking industry, retail, all affected by this huge sort of wipe out of global IT. And even to the point where our first U.S. show Worldwide Exchange, you know, we were almost every five minutes popping up on that to give an update. And it was just it was crazy. And you know, the way the team's handle it did here was incredible, globally, you know, our producers, it was just an extraordinary morning,

Tom Chitty 

I imagine the U.S. team were probably thinking or asking lots of questions about what's been going on, or at least trying to rectify the situation that was happening with their systems.

Arjun Kharpal

We'd been online for several hours already. And so we had been following this story, you know, second by second. And so they were asking a lot of questions of us about what had happened, what had been said what had gone wrong. And so, you know, we were able to provide those kinds of updates, but it was a really extraordinary day.

Tom Chitty 

One question I have about the update, and I know you mentioned that they're making lots of updates all the time to stay ahead of any bad malware. But wouldn't they be testing this update before they go and send it out to eight and a half million Microsoft devices?

Arjun Kharpal

Testing, I'm sure had been done. I have no doubt about that. And so that's the issue, where do you or how do you make that process more resilient? And these are all the questions, I think debates happening now about the fragility of global IT systems, and particularly, that update should have been robustly tested. And also, does it make sense to, you know, roll this out, globally in one go? Should you sort of phase it out? Iyou did a first phase of rollout, you would know, then if there was an issue.

Tom Chitty 

Like a pilot episode of a TV series to see what the reaction is. Is it positive?

Arjun Kharpal

Or has it crashed people's computers? And that's kind of what could have been done, I think.

Tom Chitty 

The fallout financially, I think, you know, hard to gauge exactly the cost, but [essentially] a multi billion dollar mistake. CrowdStrike, just to be clear, has admitted responsibility for the faulty software update. And I think the share price has sort of mirrored that in the sense that Microsoft hasn't budged and CrowdStrike plummeted. And, yeah, I suppose the question then is, CrowdStrike [is] probably going to be the ones footing the bill, but that whole process could take years to fall out. And I know, there's lots of air passengers that you know, aren't going to get refunded or compensated for their missed flights or their canceled flights. So I don't know. It's just so pervasive, isn't it in terms of how it's affected so many industries.

Arjun Kharpal

Well, it's something that most people traveling on an airline wouldn't even sort of think about, oh, will my airline's IT systems be okay? But that's what's happened. And also, I was walking past a retailer in London on Friday and they'd handwritten a note on their door saying, sorry, we're closed because our IT systems don't work. And that was just one case. You mentioned the airline passengers. And if they're not getting refunds from their airlines, because of this issue, they're unhappy. The airlines have probably even cost millions of dollars.

Tom Chitty 

Airlines are still having to cover costs of hotels, food, any additional costs that they might have, or whatever, as related to the canceled or delayed flight. It's just that they're not compensating the value. And this is U.K., this is the Civil Aviation Authority's advice, but they're not essentially giving them a whole lump of cash because it was out of their hands. It wasn't their issue, they didn't cause the issue, essentially.

Arjun Kharpal

You could imagine a lot of businesses as well who've lost money directly as a result of this issue, may even be thinking about legal action against CrowdStrike. There could be a ton of fallout for years to come from this on CrowdStrike, specifically, one from, you know, the stock price reaction to the reputational damage here and any kind of potential legal action. But is also legally a grey area. There's regulation, for example, around companies in Europe, in the U.S., in the U.K., around how companies if they're the victim of a data breach or a hack, they need to disclose that if it's material enough to the regulators and to their customers. Obviously, this wasn't [a] hack quite clearly. And this wasn't a cybersecurity incident. So what do you do in this situation? That's the big question.

Tom Chitty

Yeah, I mean, to [share] just a few other stats, two and a half thousand flights were canceled globally. And in the U.K., Friday was forecast to be the busiest day for departures since October 2019. Let's talk a little bit about which countries weren't affected necessarily, because there were some including China and Russia that weren't. But I know you want to speak a little bit about China.

Arjun Kharpal

China is an interesting case. Because if you think about it, yes, Windows is actually used in China. But the issue again, wasn't a Windows issue. It was CrowdStrike. Chinese companies are not using an American cybersecurity firm for their cybersecurity, clearly not, and so they wouldn't have been affected by it. Others might have even been using a completely different operating system as well. But that's why China wasn't affected. I think it underscores the bifurcation of apps and operating systems and software that's happening and we continue to see between the U.S. and China as well. But that's why China wasn't effective. What about Russia?

Tom Chitty

The irony is that, obviously, Russia avoided the chaos because Western sanctions mean that they don't use software that's owned by Western companies, such as Microsoft and CrowdStrike. So they're becoming increasingly self-sufficient using companies such as Kaspersky, which is based in Moscow for their antivirus needs. So they avoided all of it. This wasn't a hack or a cybersecurity breach, but it has been reported that hackers were trying to take advantage of what happened. What was going on there?

Arjun Kharpal

Yeah, it was quite, you know, rudimentary in many ways. It wasn't like they were trying to exploit the technical vulnerabilities at all, it was more that they were trying to impersonate CrowdStrike support or Microsoft support saying, hey, you know, click this link and we'll solve your IT issues. But they call it a phishing attempt. And so, you know, they often accompany with the link that's malicious and effectively steal your data if you click it, so don't do that.

Tom Chitty 

So I suppose we've talked about the costs, but lessons learned from what's happened, are there any and what could be done? I know just that in April The Cyber Safety Review Board, which is part of the U.S. Homeland Security Department issued a pretty scathing report into Microsoft's failed safety culture. This was off the back of a Chinese hack that affected U.K. and U.S. personnel following a summit that the two countries had. So there has been already some concerns around Microsoft safety and protocols. But this doesn't feel like necessarily their issue.

Arjun Kharpal

I think the biggest lesson everyone has learned is how fragile the global IT system is, I think how much the concentration of power is in individual companies and their software, and how businesses rely on very few vendors. And that creates a huge amount of risk in the global system when it comes to IT.

Tom Chitty 

Should there be more regulation to break up, you know, what looks like a monopoly?

Arjun Kharpal

Well, that's a whole other discussion. The issue here is that there are companies that are offering services. If you take a Microsoft, you take Amazon, they're not just offering you one service, they're offering you the cloud and Teams and everything else. And so, you know, that becomes attractive for businesses. There is that question about should, all these kind of offers and bundling of services be banned? But right now, for businesses, they'd rather say, well, I pay you one thing, and you sort out everything, it's more convenient, right? So the question is, does this spur businesses at all to think about the way that their technology stack is built up and say, you know what, maybe we need a couple of different providers for this kind of cybersecurity. And actually the question is also how easy that is, it's not easy, but I think it will spur a bit of thinking about how businesses rely on effectively one or two companies, if there's a failure at any point that can bring down a whole organization, as we've seen.

Tom Chitty 

And also have plans in place to counter an outage from a third party, because it looks like, not enough planning was done to counter what essentially was a very small defect in the code that has brought the globe to a standstill in many respects.

Arjun Kharpal

One of the most interesting things over the last few years, and even now, there's all this talk about the move to the cloud, right? The move to hosting your business and data, etc, on servers owned by Microsoft, Amazon, Google and others. But again, you are necessarily handing power over to those companies and control that if things go wrong, you might not have the ability to do backups, and control all of that data. So there's also a vein of thinking that, right, whilst a lot of stuff does need to move to the cloud, and it gives you many advantages in terms of cost savings, and nimbleness and access to new AI applications, etc. You know, there's a view now that perhaps, actually, businesses need to keep some of that data on premise, on servers in their office somewhere, or nearby. And that's an interesting thought, because, you know, a few years ago, it was more, let's just digitize everything and move it all, to the cloud. So, yeah, we'll see if that trend plays out. And actually what kind of long lasting effect this has on companies like CrowdStrike. But the big question is, can IT departments figure out how to diversify a business' IT supply chain effectively?

Tom Chitty

Final question. Will this happen again?

Arjun Kharpal

Probably, probably, because as long as this fragility exists, and there's a lack of regulation, the atmosphere is ripe for something like this. The concentration of power remains in very few companies' hands that run businesses. I'll just read you a quote, actually, from the former chief executive of the UK National Cybersecurity Center, Professor Ciaran Martin, who basically said, this was to Sky News, he said: "The worst of this is over because the nature of this crisis was such that it went very badly wrong very quickly. It was spotted quite quickly. And essentially, it was turned off until governments in the industry get together and work out how to design out some of these flaws, I'm afraid we are likely to see more of these again. Within countries like the U.K. and elsewhere in Europe, you can try and build up that national resilience to cope with this. But ultimately, a lot of this is going to be determined in the U.S."

Again, hinting at the fact that so many of these companies CrowdStrike, Amazon, Microsoft, Google are American firms. And so, yeah, the likelihood of this happening again is quite high.

Tom Chitty 

Sobering thoughts. But on that note, let's, do stat of the week to lift everyone's mood.

Arjun Kharpal

Billion dollars.

Tom Chitty

The value CrowdStrike lost since Friday.

Arjun Kharpal

No. It's quite an obvious one. Shall I give you a second guess?

Tom Chitty 

The amount it's expected to costs companies.

Arjun Kharpal

Yeah. It's a very early estimate. And it's probably likely more than this, but it's the amount the economic impact, the amount it has cost businesses, this IT outage. That's according to Patrick Anderson, the CEO of Anderson Economic Group, which is a Michigan research firm that specializes in estimating the economic costs of events like strikes, and other businesses disruptions and that stat has come via CNN.

Tom Chitty

Alright, that's it for this week. Before we go, please follow and subscribe to the show. And you can leave us a review if you'd like. And thank you, Arjun.

Arjun Kharpal

Thank you, Tom.

Tom Chitty

We'll be back next week for another episode of Beyond the Valley. Goodbye.

Copyright CNBC
Exit mobile version