For businesses that rely on software systems to keep things up and running, a system failure can stop production and lead to financial loss.
In October of 2016, hundreds of people experienced firsthand the frustration of service failure, when a denial-of-service (DOS) cyberattack on the DNS provider Dyn made many internet platforms and services unavailable to users in Europe and North America. The Guardian, Twitter, Airbnb, and Freshbooks were all among the websites and services inaccessible throughout the day.
The complex attack, which came from tens of millions of source IP addresses around the world, underscored just how reliant people and businesses are on the internet. But many businesses worry about an even bigger threat than the failure of their internet access, and that’s the failure of their software systems.
Why is Software Failure so Risky?
Software failure can have more serious effects than temporarily preventing you from checking your tweets. For businesses that rely on software systems to keep things up and running, a system failure can stop production, interrupt processes, and ultimately lead to financial losses.
In March 2011, a problem with Commonwealth Bank of Australia’s banking systems software caused ATMs to overpay customers. More than a few people took advantage of the bug, withdrawing extra cash before the bank was able to shut down its entire network of ATMs. The bank said the problem occurred because of a glitch during “routine database management.”
In 2015, it was discovered that a software glitch had led to the early release of more than 3,200 US prisoners. On average, prisoners whose sentences were wrongly calculated were released 49 days early. Incredibly, the glitch persisted for 13 years before a new IT boss for the Department of Corrections found the error.
There is no question that comprehensive software systems are a blessing to companies; yet with their added convenience and profitability also comes the risk of software failing in a way that hurts a company’s bottom line. Given modern corporations’ reliance on systems such as enterprise resource planning (ERP), it is important to be smart about failure prevention.
Let’s talk about what risks your company should keep in mind, and preventative strategies you can take to help ensure those risks never become realities.
Risk Management: Calculating Risk for Software Failure
The term “risk management” refers to the identification, assessment and prioritization of risks — and the order of those three points is intentional. The first step in risk management is to identify risks. Only after a risk is acknowledged can it be assessed and prioritized.
You can use a formula to help identify risks, and to calculate the severity of each:
Risk = Gravity of injury x Probability of injury occurring
This formula applies to assessing all risk (not only software failure risk), and most of us use it all the time, whether or not we realize it.
For most people, if there is only a 15 percentchance of light showers on a given day (the probability), we don’t bother to grab an umbrella — the gravity of being caught in an unexpected shower (gravity of injury) is not high enough to make it worth our time to carry the umbrella around. An 80 percent probability of showers, however, raises the level of risk, compelling most of us to take an umbrella on our way to work.
In some cases, the risk will change depending on your industry. For example, the gravity of injury from a sudden software outage is less severe for a retail company than for a hospital. As a result, the risk for the retailer is lower.
Once your company has identified and assessed the risk of potential software failures, those risks can be prioritized and accordingly addressed in your prevention strategies.
Why Software Failure Happens
To prevent software failures, it’s important to understand not only how to identify risk, but also how software failure happens. From there, your company can develop strategies to prevent the most common (and most risky) software failures.
The number one cause of software failure is human error in application programming. This failure happens during the coding process, often due to oversights in the software development lifecycle. A programmer may fail to consider extreme or unusual user inputs, which may cause system bugs when the software is deployed, and users begin using the software in ways the programmer didn’t anticipate.
Some failures also relate to small sign errors, such as when a programmer accidentally uses a "greater than" sign rather than a "less than" sign in their code. These errors should be caught and addressed in the development process, but there are times when they slip through and result in software that can’t perform a certain action or doesn’t react properly to certain inputs.
Preventing Software Failure
With the prevalence of human error, it’s unavoidable that some software will deploy with bugs and errors that slip through the cracks during development. As a business leader, although you don’t have control over the source code or the development process, you can take steps to prevent software malfunction — and to identify potential problems before they can cause interruptions to your day-to-day business.
Here are some techniques for staying on top of software malfunction:
- Hire an in-house programmer. If your company can afford it, having a quality programmer on staff gives you a huge advantage when it comes to catching and correcting software errors. Your in-house programmer can perform tests and monitor your system at the code level.
- Find the right software provider to monitor your system. Many companies don’t have the budget (or flexibility in their software) for an in-house programmer to monitor and make adjustments to the source code. In these cases, your software provider can be your best friend. While you may think of your provider simply as the company you purchase software from, many third-party providers also offer services like ongoing monitoring and maintenance of your software system. The right software provider can be a huge asset to your business, providing scheduled routine maintenance, backup services and expert support if you ever do experience a malfunction.
- Invest time and money into development and implementation. You may be concerned about the learning curve for new software impacting your business operations, but it’s worth it to take your time in both the research and implementation. Ensure you’re investing in a high-quality system that’s best suited to your business’ needs, planning a smooth process for implementing it and offering thorough training for your staff. In software, just like in manufacturing or any other industry, you get what you pay for. Implementing a high-quality software system for your company requires time and money and rushing through the process makes you more susceptible to missed bugs and opportunities for malfunction.
The US Department of Commerce’s National Institute of Standards and Technology concluded that software bugs were so common and damaging that they cost the American economy an estimated $59 billion every year. By understanding how to assess and prioritize the risk of software failure, and taking the time to find the right provider and implement software successfully, it is possible for your company to identify and even avoid many software failures.