Determinism

10 hours ago 1

Having been around for a long time, I often realize that when I use terms like ‘determinism’, I have a slightly different, somewhat deeper sense of its meaning.

In general, something is deterministic if, no matter how often you do it, the results are always the same. Not similar, or close, but actually the same.

Computers are interesting beasts. They combine the abstract formalism of mathematics with a strong footprint in reality, as physical machines. Determinism is an abstract concept. You do something and 100% of the time, the results are the same. That we pile on massive amounts of instructions on top of these formal systems and interpret them with respect to our humanity does not change the notion of determinism. What does mess with it a bit is that footprint in reality.

Hardware is physical and subject to the informal whims of the world around us. So, sometimes it fails.

Within software, though, we effectively disconnect ourselves from that binding to reality. We ignore it. So, we do say that an algorithm is deterministic, in the abstract sense, even if it is running on hardware that effectively injects some nondeterminism into the mix. I could probably go on forever about that touchpoint, but given that we choose to ignore it, that is all that really matters.

So, in that sense, without respect to reality, we can say that an algorithm is deterministic. Given the same inputs, you will always get the same outputs, every time. More importantly, a mandatory property of something actually being an algorithm is determinism. We do have a term for sets of instructions that do not absolutely work reliably, really just best efforts, we call them heuristics. A heuristic will do its best to get an answer, but for any number of reasons, it will not be 100%. It may be 99.9999%, but that .0001% failure rate, when done often enough, is actually significant.

All of this is more important than just being a theoretical discussion. What we need from and what people expect from software is determinism. They need software they can rely on, each and every time they go to use it. It is the core unstated requirement of basically every piece of software out there, with the exception of code that we know is theoretically close to being impossible. A heuristic would never do when an algorithm exists.

The classic example of this is hiding in plain sight. A graphical user interface is a ‘pretty’ means of interacting with computers. You do something like press a button on-screen, and that triggers one or more computers to do some work for you. That’s nice.

You press the button, and the work gets done. The work itself should be deterministic. So, each time you press the button, the results are the same.

No doubt people have seen plenty of interfaces where this is not true. In the early days of the web, for example, we had a lot of issues with ‘double clicks’ until we started building in double click protection to ignore the second click if an earlier one was in play. We did that to avoid burning resources, but we also did it to restore some determinism to the interface. People would get annoyed if, for example, they accidentally double-clicked and that caused the software to break or do weird things. It would ‘bug’ them, but really, what it did was violate their expectations that their interaction with the interface was deterministic, which is key.

So, a single click can and should be deterministic, but what about a series of them?

One of the bad habits of modern programmers is that they push too much of their workload into GUIs. They think because there is an interface where they can click on everything they need, and that each click is in itself deterministic, that it is a good way of getting tasks done. The problem is not the buttons, but what lies between them.

If you always have to click 3 buttons to get a specific result, it is probably fine. But once that grows in size to 10 buttons, or 50 buttons, or, as it seems in some cases, 100 buttons, the determinism fails rather dramatically. It’s not the software, though; it is the person in between. We are heuristic. Experts strive to be deterministic, but we are battling against our very nature to be absolutely precise absolutely every time. And that plays out, as one might expect, in long button sequences. Sometimes you hit the 100 in the right order, as desired, but sometimes you don’t. Maybe you hit 99 of them, or in the middle, the order is slightly different. It doesn’t matter in that we know that people are not deterministic, and we can absolutely depend on that being the case,

If you wired up one button to hit the other 100, then you are back to being deterministic again, but if you don’t do that, then using the GUI for any non-trivial task is non-deterministic, simply because people are non-deterministic.

This is exactly why so many old and experienced programmers keep trying to get people to script stuff instead. If you have a script, and you give it the same inputs, then if it was written properly, when it runs, it will give you the exact same outputs, every time. And it is easy to write scripts with no arguments on top of scripts that have some variability to make it better.

If you were going to do a big release of complicated software, if the release process is a bunch of button clicks in a bunch of different apps, you would be asking for trouble. But if it was just one script called ‘release.sh’ in one place, with no arguments, then your release process would be fully, completely, and totally deterministic.

If there is some unwanted variability that you’ve injected into the process, then that acts as a particularly nasty bit of friction. First, it should scare you to do a release if there is a possibility that you might do it incorrectly. Second, when it is incorrect, the cleanup from having messed it up is often quite expensive. What happens then is that it might work a few times initially, but then people get tired and it goes wrong. Then they get scared, and it either slows everything down out of fear or it keeps going wrong, and it makes it all worse.

That then is why determinism is just so important to software developers. It might be easy to play with a GUI and do things, but you’ve given up determinism, which will eventually bite you in the hand, just when you can’t afford that type of mistake. It’s high risk and high friction. Both of which are now making it harder to get stuff done as needed.

It takes a lot longer to script everything, but once you are on your way, it gets easier and easier as you’ve built up the foundations for getting more and more stuff done. As you go, the scripts get battle-tested, so they rather naturally act as their own test harness. If you fix the scripts instead of avoiding them, you get to this point where tasks like releases are so easy and reliable that there is very little friction to getting them done. The only thing stopping you from doing it too frequently is whether or not they are needed right away. This is the root of ideas like CI/CD pipelines. You’ll have to release often, so it needs to be deterministic.

Determinism plays out in all sorts of other ways within software. And usually the lack of it triggers relatively small side effects that are too often ignored, but build up. If you look for it in the code, in the technologies, in the process, and everywhere else, you find that getting closer to or achieving it is drastically reducing friction, which is making the job better and far less painful.

So it’s more than just a type of state machine, the entropy of hardware, or the noise on a network. It is a fundamental necessity for most of the solutions we build.

Read Entire Article