Error Handling

Intro

Error handling is a big topic. I am not going to discuss the details of how errors occur and why they must be handled. Instead, I will focus on the comparison between the two popular approaches to handle them: exception-handling and return-error-code.

I will try to make the case that exception-handling is technically superior of the two. However, both approaches require adopting certain discipline in coding to implement error handling correctly. Otherwise, one can make mistakes in either approach.

The Problem

Let us first understand the problem. When a piece of code or a function call fails, we have an error. For example, opening a file for reading that is not present at the path specified is an error. The fopen(...) call will fail.

Now fopen can’t possibly know how the caller intends to work around this error. Therefore:

Either it can choose to ignore this and continue running its code, which will eventually at some point cause the program to crash. Or,
It should realize that it can’t proceed any further and therefore must return control to its caller as well as communicate the error to it.

Now that the function call to open the file has returned, the caller will want to proceed further. But at this point, it needs a way to know if the call succeeded or failed. Therefore, fopen must have some way to communicate to its caller its success or failure condition. (fopen does it by returning a valid file FILE pointer or a NULL value). Otherwise, the caller cannot know if it should proceed further or handle the error.

Since fopen communicated its success | failure condition through its return value; we call this return-error-code style of error handling.

In code, this would typically look like this:

FILE* fp = fopen ("path/to/file.txt", "rt");
if (fp)
{
    // .... read and do your logic
}
else
{
    // error case; what to do here?
    //  perhaps try an alternate file name, etc ...
}

One thing should be very clear from this simple example. When a function runs into error, it must not continue and therefore must return, thus transferring control to its caller.

Second, it must communicate about the error to its caller. (or the code to which the control will be transferred to).

Third, the code that receives this error information must check that value for validity before using it. It has to decide what to do in the else case. Perhaps the code has enough context that it can handle the error; say by trying an alternate name or a different path. But sometimes the code may not know what else to do! In that case, it clearly must not proceed further. So what must it do? Return to its caller! And what must its caller do? It should do the same check! If the returned value is valid, it should proceed with the next code line, otherwise it too must either handle the error or if it can’t; it too must return.

Error Handling Basics

Take a moment to ponder over the above. What we are basically saying is that when errors are being returned with return-error-code; the caller must:

Check the returned value.
Proceed further iff (if and only if) it can handle or mitigate the error.
Otherwise it must return control to its caller.
And the caller must do the same logic; either handle or continue the return chain.
Until this sequence of call-returns reaches a code where this error can be handled.

Hence, this approach in practice causes every function call to be followed by an if-else clause check everywhere. And this discipline must be adopted throughout the entire call chain. If at any point in this call chain, one of the functions fails to implement this if-else check then the program potentially has a bug. That is because a return chain executed from far below, propagating the error code up to a higher level will break without getting handled.

This indeed is a cause of many bugs. The more complex the code, the more if-else nesting levels, the more likelihood of such an omission. Unfortunately, no compiler or language will enforce this. It is all on the programmer to adopt the discipline to make sure every function call is followed by an if-else check; and so on and so forth.

Error Handling Steps

In the light of the above explanation, we can summarize error handling as comprising 3 distinct steps.

Error Detection
Error Propagation
Error Mitigation

Any function call can fail. If that function reports that error by an error return code (like the fopen example above), the caller must check for this return value to be either valid or invalid. This is the detection phase.

Once it has determined that this is an error, it has two choices only. Either it can solve / handle / mitigate the error or it can’t. If it mitigates the error, then the program is no longer in error, and regular control flow can resume.

But if it can’t handle it, then it must not continue. It can only do one thing; return to its caller; passing it the error code that it had encountered. It can perhaps add its own context or most likely return a different error - an error that makes sense between it and its caller. But the idea is that if it can’t handle, it must propagate the error up the call chain. This is the propagation phase.

At some point, a function high up in the call chain will have enough context to finally be able to handle / mitigate the error. Once handled, the normal program flow can resume. This is the mitigation phase.

So, in summary, an error must first be detected. Then it must be propagated to a point where there is enough context. At that point, the error can be mitigated.

Error Handling Via Exceptions

In exception-handling style of error handling, a function does not report error by returning an error code to its caller. Instead, it throws an exception. The throw mechanism is implemented in compilers as a stack unwinding operation - meaning, the code’s control flow automatically travels up the stack to a point where the programmer has explicitly specified it can mitigate the error; typically expressed by a catch statement.

So let us see how exceptions perform error handling.

First is the detection. If the called code returns its value by a return code, the caller must do an if-else style check. If, however, the called code throws an exception when it encounters an error, the control flow will not return to the caller directly. The caller will neither get a chance to inspect the error nor proceed further in its code. The latter is indeed desired.

The second step is propagation. Here, unlike return-error-code, the caller doesn’t have to do anything. The compiler takes care of setting up the code necessary to propagate the error to a point higher up in the stack where it can be mitigated.

The third step is mitigation. Here the programmer has to specify in the code where he thinks an error can be handled/mitigated. This is done by catching the exception. In the catch clause, therefore, the programmer must actually perform mitigation steps. If there are no mitigation steps, then there is perhaps no need to catch the exception.

try
{
    FILE* fp = fopen ("path.to_file.txt", "rt");
    if (!fp)
        throw (std::Exception ("Failed to open file ...."));
}

// and elsewhere in the code ..., catch exception to handle it.
...
...
...
catch (Exception& e)
{
    // ... somehow handle it
}

(Aside: It is sometimes ok to catch exceptions to add more context or to log but that is more an implementation detail and even that should be used carefully, lest you should clutter up the code with needless catches.)

Comparison

For me, this analysis really makes it easy to compare the two approaches. Both approaches require three steps: detection, propagation, mitigation.

Detection will always require if-else when interfacing with code that reports errors via error-codes. When interfacing with code that reports errors with exceptions, no logic is needed at the call site. So in this regard, both approaches are similar.

Mitigation, the third step, is also similar in both approaches. Both will need to handle the error assuming they have enough context.

Propagation is the only step that differs between the two.

In exception-handling style, propagation is automatic and implemented by the compiler.

In return-error-code style, propagation is manual and has to be implemented by the programmer.

For this simple reason, I prefer exception-handling style!

Advantages

Cleaner code; code is not littered with if-else. With if-else checks, what should be a single line of a function call ends up becoming a cluster of 5 lines.
There are no if-else nestings. If-else checks get even futher worse since the true case then nests further inwards, resulting in a growing nesting of if-else cases.
The code looks more logical. You just code for the success path; every line of code is a function call, making up a step of your logic.
You don’t have to worry if you can handle an error or how to handle or what to do if a call failed. Errors are handled elsewhere.
Since you don’t have to check the return values of functions, you can just use them directly in your function calls. For example: f1 (f2 (f3())). If the returned values could also convey failure cases (like that in the case of fopen), this style of programming would not be possible.
Automatic stack unwinding utilizes RAII (Resource Acquisition is Initialization) idiom. In C++ this destroys stack objects automatically when they go out of scope. This is an incredible compiler-provided facility that is sadly missing in so many other modern languages. If you write classes in such a way then you don’t have to worry about closing / exit behaviors for many objects in the face of exceptions.

Do(s) of Error Handling

Pick either exception-handling or return-error-code style; not both.
Since at some point you will most likely have to interface with a system library or a library that returns error codes; you will have what I call an ‘interface-boundary’. You will need to detect errors at this interface boundary using if-else, but then convert it to an exception and propagate it via exception. This is the way to interface your exception-handling code with a return-error-code way of code. But once you do that, don’t go back. (More on this in the section below.)
It requires some trial and error in code to get a good grasp of where to do error handling; meaning where to put your catch clauses. Adopt some discipline in your code and identify a few places in your call chain where you can handle errors meaningfully. For all other places, just throw the errors and code as if everything succeeds.
If a called function can return an error code, the client must inspect that value before continuing or using that value.
Use query functions.

Often times you have a case where the called code would throw an exception in the case of an error but the immediate caller can handle the error. This will force you to use try-catch at a very low scope. This, however, defeats the very purpose of exception handling since the whole idea is to do clean coding and not have to worry about checking and handling errors on every line of code.

So in this case, use query functions. Instead of opening the file directly and getting an exception, first query for the existence of the file separately, through a bool-returning function like: does_file_exist(...). Use this in an if construct.
```
if (does_file_exist (...))
    // proceed with opening it
else
    // try another path
```
This local check allows you to verify the existence of the resource before working with it.

Do note that it makes sense to use query functions only if the caller could continue normally in case the file didn’t exist. If the caller in its else clause can’t continue and would need to inform its caller about the error by throwing an exception, it would have been better to have just called the file opening operation directly without the query function, and let that except out in case of an error.

These kinds of query functions may or may not be available in the library you are working with. However, for your own code, you can write such type of query functions and then write your remaining code using these functions. It prevents needless low-level and near-fault-site exception handling.

Don’t(s) of Error Handling

Don’t go crazy with too many exception classes. I know that is what most libraries do; but I have found that it clutters up the exception handling code. In practice you typically do not have enough context in most places in the code that you can handle exceptions effectively. Typically you will be handling exceptions higher up in your call chain. At that level, a few higher-level exception classes are typically more than enough. If you have too many low-level exception classes, you will then need to sprinkle catch blocks at various mid-levels. At this point, the code is no different than return-error-code style of code. And since exception handling is more verbose, it looks even worse!. In other words, if your exception handling code starts to look like just an alternative to if-else constructs, you are better off using return-error-code. (Of course, if your underlying library is exceptions-based, you are out of luck; perhaps switch to another library!)
Related to above, if you end up putting catch clauses at every level, you are likely doing it wrong. If you can mitigate error at every level, it is easier to communicate error using return-error-code. Exceptions make sense when you can’t mitigate the error in the vicinity of the error and you must propagate it elsewhere up in the call chain to a more suitable point.
Also related to above, whether your code looks clean or horrible when using exceptions depends greatly on the library you are interfacing with. Some libraries have overly fine-grained classes for exceptions thus promoting a style of exception handling which mimics if-else checks. Don’t do it. Instead, arrange your code so that you can catch errors at a few choice places in your code, not at every function level. It is perhaps this style of coding that (rightly) gives exception-handling style a bad name!
Once you convert return-error-code code to exceptions, don’t go back. This is not just from an aesthetics point of view, it is critical for program correctness. Your client (which includes your own code) has to have an implied contract with the methods it calls: either errors are returned via exceptions or they are returned via error codes. Depending on which style the errors are returned, the client code is fundamentally very different.
If you are just using exceptions, client code does not check or validate values returned from functions. It assumes only success-path code and is therefore written with that in mind. But if at any point you switch the paradigm, you are then putting onus on the rest of your functions up in the call chain to now switch back to if-else style programming; thus making it necessary for them to check return value of every call.
Therefore, once you switch from error-codes to exceptions; don’t go back; you will end up writing incorrect programs if you don’t implement if-else again. And if you do add if-else; you lose the advantage of switching to exceptions in the first place; and as a result, only end up making your code more confusing.

Summary & Conclusion

Any code can fail. Code following an error must not continue before mitigating the error. If an error can’t be mitigated at that point, it must be communicated to a point in code which can mitigate it. Therefore this error handling regime can be broken into three distinct steps:

Error Detection
Error Propagation
Error Mitigation

There are two popular ways to implement these: return-error-code (A) and exception-handling (B). In A, the code returns the error (error-code or data) to its caller. In B, it throws an exception which transfers control up the call chain to a point where the programmer has explicitly indicated it can handle the errors by using a catch clause. As argued, steps 1 and 3 are the same in both cases. It is only step 2 that is manual in A but automatic in B. Hence B is the safer and cleaner choice for error handling.

Regardless of the choice though, both approaches, like many other things in programming, require discipline as current languages and compilers can only help so much. For A, if-else check after every function call must be maintained throughout; no returned value from a function can be ignored, ever! For B, a trial and error approach is needed to settle on a reasonably good distribution of catch clause placements. Putting in catch clauses after every function defeats the purpose of exception handling. Exception handling is meant to provide non-local means to handle errors; by design!