You're probably using "unsigned" incorrectly, and that makes me sad.
Chances are that if you write code in C (or related languages like Java, C#, or C++), then you've come across the "unsigned" type, and its relatives "unsigned long" and "unsigned short". If you've written code that uses unsigned types, it's also quite likely that you've used them incorrectly, at least by my standards.Misuse of "unsigned" in C is one of those things that I keep seeing over and over, with different developers, even folks who really ought to know better. I find it immensely frustrating. If I had to pick one aspect of C that was responsible for more stupid bugs than anything else, this'd be one of the top candidates. Probably not the top candidate - the string-handling functions in the standard library probably win that handily.
Here are my simple rules for the use of unsigned integer types:
- Don't use unsigned just because "that value should never be less than zero"
- Always compile your code with all warnings enabled
- Avoid mixing the use of signed and unsigned integers in the same calculation
- Do use unsigned when modelling hardware registers that hold unsigned values
- Do used unsigned when performing bit-wise arithmetic
Don't use unsigned just because "that value should never be less than zero"
This is by far the most common abuse of unsigned types that I see on a regular basis. It's not even a bad idea, as far as it goes. A majority of the values in a typical program are going to be non-negative by design - sizes, screen coordinates, loop counters, etc, etc. The problem really isn't unsigned values per se, it's how unsigned and signed values interact.Part of the problem is that constant values in C are signed by default, which means that signed values will creep into your program unless you make a concerted attempt to avoid them. When you compare signed and unsigned values, the results will often not be what you expect. For example:
unsigned four = 4;Looking at this code, it's pretty obvious what the programmer intended, but in fact the comparison "neg_one < four" evaluates to false in this case. This is because the signed value will be "promoted" to unsigned, turning it from a small negative number to a very large positive number, before the comparison is made.
int neg_one = -1;
if (neg_one < four)
{
printf("true\n");
}
else
{
printf("false\n");
}
In actual cases of this problem in the wild, the declarations will typically be a long way away from the comparison, and it won't be at all obvious what the cause of the problem actually is. I've seen experienced programmers stare at the debugger in disbelief when it seems to be showing them that their program thinks that -1 is greater than 4. An additional complication is that constants in C are signed by default, so you can replace the "neg_one" variable in the example with the constant "-1", and you'll get the same behavior.
A related problem comes with the handling of sizes and lengths. A size is typically going to bea non-zero value, so it "makes sense" to use unsigned variables. The problem is that sizes are often calculated by subtracting one value from another. If you accidentally subtract a larger value from a smaller one with signed variables, you get a negative size, which you can at least detect and handle (with an assert(), if nothing else). If you're using unsigned math, you just get a huge bogus "size", which may or may not be immediately obvious.
Always compile your code with all warnings enabled
Admittedly, this rule is more general, rather than specifically tied to problems with using "unsigned" correctly. Most C and C++ compilers have an option to warn on comparisons between signed and unsigned values, when there's a chance the comparison will be interpreted incorrectly. It's even more frustrating to debug one of these issues when compiling with warnings enabled would have produced a warning message that points to exactly where the problem is, but some yutz has that particular warning disabled.Of course, they have it disabled because enabling warnings on comparisons between signed and unsigned tends to generate zillions of bogus warnings. That's just a good reason to avoid using unsigned variable, where possible - it obscures the actual problem areas with bogus warnings.
Avoid mixing the use of signed and unsigned integers in the same calculation
Given the example above of a simple comparison going wrong, it ought to be obvious that anything more complex is at least as likely to go subtly wrong in some way. Again, the real problem arises because the declarations of the variables (and constants) will be far far away from the point of the errant calculation.So, when is it okay to used unsigned types?
5 comments:
This is totally about me, isn't it. :(
My most favorite subject: unsigned types; and no mention of the U suffix. Ugh. How depressing. - lance
Yeah, good point. I should add that.
I'm curious what your thoughts are on this article:
http://embeddedgurus.com/stack-overflow/2009/07/efficient-c-tips-10-use-unsigned-integers/
He's very adamant about unsigned integers. Could you comment on why you agree/disagree with him?
I have exact the opposite belief: never use signed type unless the quantity we are modeling is signed by nature. Of course, turn on ALL the warnings and treat warning as error.
Post a Comment