Ask an abstract C ++ programmer what are the most common mistakes and are most likely to hear: null pointers, division by zero, undefined behavior, array out of bounds, uninitialized variables. But this has little to do with reality. Rather, it is a set of mistakes that we have all heard about when learning to program or read in books. In practice, a lot of effort and time is taken up by completely different mistakes, which, nevertheless, remain in the shadows during discussions. Let’s talk a little about this interesting topic.
As I said, the programmer will most likely mention that division by zero errors and uninitialized variables are causing the problem. Of course, there are such mistakes, but their relevance, to put it mildly, is exaggerated.
Due to the specifics of our work, we check a large number of open source projects. So, based on the experience of researching projects, I undertake to assert that the errors of division by zero and access to uninitialized memory are very rare.
Firstly, there are few division errors, since, in fact, it is not so often that you need to divide something in programs :). Secondly, everyone knows about these errors and quite accurately checks the divisor value. Third, compilers are getting better at finding such errors.
The situation is similar with uninitialized variables. People are well aware of this error pattern and, in general, write fairly neat code. Fortunately, many agree that a variable should have as little scope as possible, and that it is good style to declare a variable as soon as it is initialized. In addition, compilers have become smart enough to warn about the use of an uninitialized variable.
In total, such errors are rare in real projects. Why, then, are they remembered so often? Perhaps, programmers encountered these errors when they were just learning to program. The scenario of using an uninitialized variable in the very first programs is very plausible. And the very first experience leaves the most indelible mark :). Another option is that fears take their roots from old books, when even more dangerous programming languages were used, and compilers were extremely weak in terms of identifying potential defects.
Okay, so what about undefined behavior, null pointers, array overruns? There are indeed many such errors and they are often quite difficult to detect. So the worries about them are quite justified. Look at how many errors with null pointers we have collected in open source projects: V522, V595, V757, V769, V1004 and so on. Programmers expect problems with this and indeed face these very problems. So let’s get to the fun part as soon as possible. These are mistakes that are not talked about, but there are a hell of a lot.
I argue that there are vast classes of errors that cause a lot of problems, but which are undeservedly little talked about. It’s about typos.
Ha, the reader will say. Well, I know about typos. What is the secret and secret here? There is no secret, but there is a clear underestimation of this problem. This can be seen at least from how test suites are built to evaluate the performance of analyzers.