The C programming language shouldn't exist.
Over the past two semesters, I had the pleasure of writing a thesis entitled: Code Modernization Techniques Using Clang-Tidy for C23 Checked Arithmetic. The important part here is that I had to research the C standard. What is the C standard? What does it have to say? Where did it all go wrong‽
The C standard is a document that defines the C programming language, and it contains some unusual features. I wanted to make some posts going through things that stuck in my mind.
You can find the standard here.
Observable behavior is a concept that every C programmer should know about. You can think of an observable behavior as something a compiled C program does at execution time.
A statement like int c = a + b; followed by printf(“%d\n”, c); gets compiled, and is expected to produce some output when it is executed.
The output of the (allegedly) correct program (I don't trust compilers) from this source is an observable behavior.
The standard explicitly defines five types of observable behavior, but the important ones to know are:
- Defined behavior: Stuff that the program must do that is explicitly defined by the C standard when a particular statement is compiled.
- Undefined behavior: Things where the standard literally says the program can do whatever it wants.
Undefined behavior can be caused by all manner of malformed statements. Technically when a C program has a single malformed statement, the entire program is considered malformed; the program can literally do anything it wants and still be compliant with the C standard. The running joke here is that the program could make demons shoot out of your nose.
I think there are funnier behaviors a program can have, but I'll bite my tongue (if the program made me do this then it would be compliant with the C standard).
Usually when a compiler runs into a malformed statement, it'll often ignore the statement. This is for performance reasons: doing nothing is less work than executing some other instructions. There are other cases where something does happen, but the consequences of said instruction are ignored (like an out of bounds read).
I want to cover the behavior of signed and unsigned overflow and wraparound.
Every integer is represented using a series of bits. The number of bits in a representation determines the largest and smallest values they can store: 4 bits can represent unsigned values 0 to 15. So, depending on the type of an integer (like a char, int, long long, etc.), you are telling the program to reserve N number of bits to store a value. Strange things can happen when you try to put a number that is too small or large for a particular variable.

With unsigned variables (variables that only have positive values), wraparound occurs when you breach these bounds. This means the storage operation becomes a modulo operation with the largest possible value your variable can store. In normal terms, this means your value will NOT be the one you initially stored but some other number that is in range.
With signed variables (ones that can be negative too), you get overflow. This is literally the same thing but values can overflow to very large or small values depending on the representation the program is using.
Okay now for the thing that I think is weird, unsigned wraparound is considered defined behavior whilst signed overflow is considered undefined behavior.
Bro what?
This is not a distinction I knew of before and it surprised me. It bewildered me, even.
You would think any kind of overflow or wraparound would be bad but the C standard explicitly says it's OKAY to do this with unsigned integers. From what I gather the reason is related to replicating how the underlying hardware operates, and some developers take advantage of this.
I can talk more about this and why I think it's bad, but the gist is this is a weird quirk of the C standard that was very prominent in my thesis.
Okay this one I think is interesting but technically no longer exists as of the C23 standard.
Standards before C23 allowed several ways for an integer to be represented. These are:
sign magnitude
one's complement
two's complement
Per the C23 standard, compilers are supposed to use two's complement which is by far the most common representation today. I won't go into detail about each representation, but I again want to talk about a special kind of overflow.
Surprisingly, division of two integers (not floating point numbers) can result in an overflow. If this doesn't seem counterintuitive I suggest you take a moment and think about it, you may even realize how this could happen.
The issue is division by negative numbers. In two's complement, there is an asymmetry with what values can be represented. The most negative number in two's complement cannot be represented as a positive number with the same number of bits.
There is a logical follow-up to this: if I use another representation like, one's complement, then technically this type of overflow does not occur!
This is all to say, two’s complement sucks and we need to bring back one’s complement.
C was not given to us as a punishment to humanity but was created by other humans. First it was Dennis Ritchie and Ken Thompson at Bell Labs in 1972, and was later standardized in 1989.
Today there are a handful of people that maintain and improve on the C standard, this is the C standards committee. They have other names like JTC1/SC22/WG14, which all sound like SCP characters so I'll just refer to them as the standards committee.
The standards committee has a website, and on this website is a very nice picture of all of them. For some reason, the photographer did not want to be attributed to the photo they took. I thought this was hilarious; whoever took this picture was like yeah I’m good please DO NOT credit me.
From here I found there is AN OFFICIAL C WEBSITE! This actually just came out earlier this year as part of an effort to rebrand themselves as a modern programming language.
This rebranding is documented in N3408 and has more interesting little tidbits like a branding discussion on the C logo. They even take a jab at how people often use the C++ logo with the "++" removed by saying "[C was] established before C++ (C with Classes) was even conceived, let alone its logo".
I don't know much about the culture of these committees but I like to believe this is the standards equivalent of throwing someone across the room.
They're also looking at creating an official repository on GitHub and a URL shortening service for their documents, which I can attest is much needed.
I am very curious how this will affect the standard and the broader community. Can people just open issues on the C standard? Could someone pull of a type of "standards attack" where they contribute malicious behaviors to the standard (i.e. nasal demons are now defined if you have an overflow)?
In all I do appreciate what these wonderful people do and am excited to see that this language is still being worked on all these years later.
.png)





