I am a big fan of risk-based approaches. I have developed some of these approaches, used them, and taught them over the past 18 years. In order to prioritize testing, we need to understand risk and apply it intelligently.
I've also been burned by risk-based approaches. Software that should not have failed, did fail and caused a greater impact than expected. This happens even to the best of the best. Consider situations that are risk-based and still are on the losing end of the risk equation:
- Car owners that have low safe-driving premiums sometimes wreck their car.
- The tornado that destroys a home in a "non Tornado" area.
- People who outlive their life insurance policy.
- Good credit risks that go bankrupt...
This is my little "lessons learned" compilation about risk-based testing - especially concerning ways that we can be fooled by risk.
#1 - There is no physics of software that dictates how it must behave. Just because our risk assessment may indicate a software item should not fail, the software may not behave that way. As Lee Copeland states in his book, A Practitioner's Guide to Software Test Design, in the discussion on pairwise testing, "there is no underlying 'software defect physics'...". In other words, in spite of what we think we know about software and defects, we can still see software behavior that evades our assumptions and approaches. As testers, we need to understand that all the tricks and techniques we use may be helpful, but are not guaranteed to be totally accurate or effective.
Many of the risk assessment methods are based upon observed behavior of software in given situations. There are no universal laws of software. What we have observed over the years of observing software behavior may not apply to all situations. Therefore, we need to be careful in concluding which risks are present based on someone else's experience.
#2 - We can't see clearly into the future. Unexpected things can happen that can change the risk equation by a little or a lot. Risk by its very nature contains a degree of uncertainty.
Another way to say this is "You don't know what you don't know." There are things that are outside of our knowledge. The troubling thing is that sometimes we don't even know what those things are!
#3 - People do not always provide accurate information. When we base a risk assessment on information obtained from people, there is always the possibility the information could be skewed, inaccurate or misleading. It's not necessarily that people lie (although they do from time to time), but sometimes they forget things or they have a remembrance that may be slanted toward their view of things.
#4 - The "I want to believe" syndrome. There are times that we don't have a rational reason to believe in something, but we would really like to. In terms of risk, some people believe in a lesser degree of risk than is actually justified. They may want to believe that a software item will not fail, but that does not change the actual risk. This could also be called "risk denial." Risk denial is one approach to dealing with risk, just not a good one.
#5 - The "High Risk" effect. This is the opposite of the "I Want to Believe" syndrome. In this view, so many things are seen as high risks, the value of risk assessment is lost. Few things fall into the "low" or "moderate" categories of risk. People may tend to favor the importance of their own areas and think that if anything fails, the entire operation will cease. The problem with this view of risk is that the value of risk assessment is lost because people fail to establish workable risk levels. Everything is a high risk.
#6 - Flawed assessment methods. This can result from many causes, but the most common ones I have seen are 1) misapplying someone else's methods that don't apply well in another context, 2) devising an inaccurate and unproven method on your own, 3) misapplying a good method incorrectly because of a lack of understanding. The problem is that you may place a lot of faith in the assessment method without fully realizing the limitations and risks.
#7 - No assessment method. In this view, risk assessment is based on intuition. You can be easily fooled by this view of risk, but at least you are aware that it is a guess (perhaps an informed one, but still a guess). A major problem is that you have nothing upon which to base risk assumptions. If later you ever need to defend a risk-based decision, without a method you are left with little defense of how you arrived at the decision. This is a very bad position, especially when safety, lots of money, or reputation is at stake.
#8 - Failing to incorporate intuition. I know, another opposite. I just wrote that intuition alone can fool you and can leave you defenseless if anyone challenges your rationale. However, there is a role for intuition. There have been many times when I followed a hunch and found defects even when the risk assessment indicated a low risk of failure. Unfortunately, this is not something that can be trained, but must be learned over time. Experience forms the basis of a lot of what we consider intuition and we need to learn how to listen to the inner voice that tells us to consider things perhaps not indicated in a risk assessment.
#9 - Only performing the risk assessment once. Risk assessment is a snapshot taken at a given point in time. The problem is that risks change throughout a project. To get an accurate view of risk, assuming the method is reasonably sound, the assessment should be performed on a regular basis. In fact, the assessment should continue even after system deployment because the risks are still present and still changing. Ideally, you should have pre-defined checkpoints throughout the project, such as at the concept stage, requirements, design, code, test and deployment. Some people find that even within each of these project activities multiple risk assessment snapshots may be needed.
#10 - Failing to report risk assessment results correctly and timely. The longer a known risk remains unreported, the less time is available to address it. In addition, the risk may increase or decrease over time. When risk assessment results are conveyed with missing, incorrect or ambiguous information, then any conclusion based on them is at risk of being incorrect. It's helpful to keep in mind that risk assessment results are like any other form of measurement and can be manipulated to suit the objectives of the presenter or the receiver. A case in point is the situation where the presenter of the results is fearful of giving bad news.
#11 - Failing to act on assessment results. Unless you take action on a risk, the risk assessment is little more than an exercise. You may have learned a great deal from the assessment, but to prevent problems or prioritize testing you must make adjustments in how things are currently being done. You may have an accurate and timely risk assessment, but it does little good without application.
#12 - An inappropriate view of risk. You can view risk from multiple perspectives. The project view of risk includes how the project is managed, the level of user and customer involvement, the quality of user requirements, etc. This view is helpful in keeping the project on track and preventing project failures. The technical view of risk identifies where the risk is found in a system or application. This is the view that is needed to prioritize testing at the application level. Finally, there is the business or mission view of testing. This view of risk assesses how our business, mission, customers or users may be impacted. This view of risk is helpful in determining the criticality of processes.
Another aspect of the level of risk is shown in the example below. Let's say that you have performed a risk assessment and have assigned levels of risk to test cases (Figure 1). This is fine for testing isolated functions. However, if you need to test a transaction that involves test cases from all levels of risk, then the level of risk at the test case level does little good. You need to know the risk at the transaction level (Figure 2). This often requires that another risk assessment is needed at the transaction level.
Figure 1 - Risk Assignments at the Test Case Level
Figure 2 - Risk at the Transaction Level
#13 - The "cry wolf" syndrome. As already stated, sometimes a risk is identified yet fails to materialize. When the same risks are raised, even rightfully so, yet fail to be experienced, people start to not believe them. Examples of this are hurricane forecasts, tornado warnings, economic forecasts (such as recessions), etc. So, like the boy who sounded the alarm in jest about a wolf, sometimes a risk may sound like just another concern that is exaggerated. Unfortunately, sometimes a wolf (or risk) actually appears!
The Safety Net
There is a word that is often used in conjunction with risk, that people sometimes omit. That word is "contingency".
The reason that contingencies are needed is because we have a rich history of seeing how real life events may not match the risk assessment. Think of a contingency as a "Plan B" just in case a risk materializes.
In the insurance industry, reserves are established to cover higher levels of loss than normal premiums may cover. Minimal reserve levels are set by law. An insurer may set higher levels if they need more assurance that they can cover unexpected losses. An insurance company may also obtain coverage through reinsurance companies to cover large or catastrophic losses. These kinds of protections have been established to protect insurance policyholders and to help assure the financial stability of the insurer.
There is debate among project managers about "padding" estimates. Some feel that the estimate should be carefully calculated as accurately as possible and that should be the actual working estimate. Others feel that this approach is a recipe for disaster because there is no room for dealing with contingencies.
I blame senior management and customers for the root cause of this debate. The problem is that management and customers are notorious for taking any estimate and reducing it by X%. Some people believe that all estimates contain padding, therefore all estimates can (and must) be reduced to "more reasonable" levels.
I propose a healthier view of this debate. Instead of padding, I prefer to call these "project reserves". When used as intended, project reserves are a good thing. They help us deal with the unexpected.
Problems arise when people abuse reserves. An example of this is when people steal time from reserves to compensate for poor project decisions. It's one thing to use a reserve for extra needed time because a supplier is late, but another thing to use the reserve because developers are creating highly defective software.
I believe project reserves are needed and form one point of balance when we are fooled by risk.
Another form of balance is the contingency plan. Reserves are just time and money - they don't tell you what to do, but a contingency plan does. You can have a contingency plan for just about any project activity. However, there are some major activities that deserve priority consideration. Here are some examples of situations that justify a contingency plan:
- What if the requirements are inadequate?
- What if the degree of requirements change is excessive?
- What if high levels of defects are discovered during testing or reviews?
- What if severe problems are encountered during implementation?
These contingencies can also be addressed in a risk mitigation strategy.
The reasonable conclusion is that every risk assessment should also address project reserves and contingencies.
Summary
The better we understand how we can be fooled by our approach to risk assessment, the better we can develop a rational approach to keeping the project on track. The key is not to rely on just one aspect of the risk picture. We must also balance risk with contingencies to compensate for the eventual shortcomings with any risk assessment or approach.