Is Software the UFOlogy of Engineering Disciplines?

2 hours ago 1

One area where software development lags far behind other technical design disciplines like electronic and mechanical engineering is in standards of evidence.

To illustrate what I mean, I want to talk about the July 2023 congressional hearings on Unidentified Anomalous Phenomena (“UAPs”).

Former military and intelligence personnel gave testimony under oath about encounters with UAP, and some sensational claims were made by David Grusch – who had worked with the UAP Task Force at the Department of Defence – about captured “non-human” aircraft, materials and “biologics” being held by private defence contractors.

Some UFO researchers hold the testimonies of the very credible witnesses as proof that we are being visited by at least one civilisation that’s far in advance technologically of ours.

The scientists working in the DoD’s All-domain Anomaly Resolution Office (AARO), and in NASA’s UAP working group disagree with that interpretation.

Witness testimony – even given under oath – is merely evidence that somebody said something. And maybe they really believe what they say. But that doesn’t make it real.

Since that congressional hearing more than 2 years ago, no hard evidence has entered the public or scientific domain that supports Grusch’s claims.

The NASA working group complained that the military were less than forthcoming with good data that it’s believed they may be holding (they admit as much on the record). But, again, that’s not in itself evidence of “non-human” visitation and alien vehicle reverse-engineering projects. It’s evidence that the military and their contractors are keeping secrets. Who knew, right?

And, yes, there are videos – confirmed by the military to be genuine – showing anomalous objects recorded by Airforce and Navy personnel during routine operations off the coast of the United States and in combat zones around the world.

But those videos, taken by themselves, show nothing particularly sensational. Accounts of “instant acceleration” and other impossible manoeuvres accompany these videos, but are not captured in them.

And that has been the general nature of UFO/UAP evidence going back to the 1940s. When the anecdotal noise is filtered out, there’s very little left in credible, meaningfully testable evidence to support the extraterrestrial (or extra-dimensional, or time-traveller, or hollow-earth-dweller or Atlantean or Lunar Nazi) hypothesis.

What hard evidence does exist points to one or more genuinely unknown physical phenomena. But that doesn’t mean aliens. That just means ¯\_(ツ)_/¯

More than 20 years ago, I corresponded with famous UFO researcher Stanton T. Friedman. His central claim was that “the evidence is overwhelming that some UFOs are extraterrestrial spacecraft”. He was kind enough – at his own expense – to mail me a thick folder of this “overwhelming evidence”, which included reports written after official government studies in the US, France and other countries. (The UK MOD’s 1990s study, Project Condign, was declassified a couple of years later, adding to the corpus of scientific studies.)

All of these studies, if you read beyond the executive summary, come to a similar conclusion: UFOs are real, and we don’t know what they are.

They usually also conclude that further scientific study is warranted. But that’s rarely followed through, because UFOs are the “third rail” of a scientific career – unless you’re safely tenured, like Avi Loeb or Michio Kaku, most scientists daren’t touch the subject.

Anyway, back to Mr Friedman. Stanton Friedman was a scientist – a nuclear physicist (a real one!). He would often wear these credentials as evidence that his approach to the study of UFOs was rigorous in the same sense that his work on, say, nuclear space propulsion was rigorous.

But that was simply not the case. Friedman, like most UFOlogists – not all, mind – approached the subject like an investigative journalist. He didn’t look for physical evidence. He looked for documentation to support his theories, and his “rigour” manifested in attempts to authenticate these documents.

Even if the MAJESTIC documents are from genuine top secret classified government files (and that’s very much disputed still to this day), a document is only evidence that somebody wrote something down.

So I was not overwhelmed by the evidence Stanton sent me. Intrigued? Definitely. Open-and-shut case? Definitely not.

I agree with many of the official government studies: UFOs warrant serious scientific investigation. But, curiously, many UFOlogists – including Stanton Friedman – disagreed.

I had been following the work of an electronic engineer, at the time working in NASA’s Jet Propulsion Labs, called Scot Stride who was proposing multi-modal instrumented searches of the sky to collect more and better data on these phenomena.

He called it “SETV” – the Search for Extraterrestrial Visitation. Not to be confused with SETA – the Search for Extraterrestrial Artefacts. SETV’s null hypothesis – that no UFOs are extraterrestrial technology – was concerned with contemporary visitation.

Now, to me, a not-so-long-ago-at-the-time physics bod, SETV sounded like a good idea. The challenge in understanding the nature of UFOs has always been the amount and the quality of the data – too much noise, very little signal.

An object tracked by multiple sensors, from multiple locations, could provide far clearer data on the reality (as in, is this object real and not, say, a sensor blip?), the size, the distance, and therefore the speed or acceleration of objects in the sky.

But Stanton poured cold water on the idea of instrumented searches. UFOs, he told me, cannot be studied scientifically. Which I thought was a little odd, given his physics credentials – far superior to mine – and that he was kind of using them to shore up the credibility of his work. He was the “flying saucer physicist”.

SETV, as far as I know, never got off the ground – perhaps due to lack of funding. (A similar initiative called UFODATA also appears to have stalled. I hold out some hope Avi Loeb may help to divert some research money into other instrumented sky searches.)

But, as of today, the state of the art in UFO/UAP evidence is lots of noise and very little signal.

I’ve met similar hostility from folks who, in one breath, claim that software engineering is “scientific” – because data – but row back on that when I suggest we might need better data: more signal, less noise.

Most empirical studies into our discipline are small, attempting to extract meaningful trends from statistically insignificant sample sizes. This leaves them wide open to statistical noise.

Many studies are, like the congressional UAP hearings, building on reported – rather than directly observed – data. If a development team tells you that switching from white bread to wholemeal reduced their bug counts, that’s anecdote, not hard evidence.

Some folks say that software engineering is scientific because it’s grounded in scientific principles – many would argue that engineering is the “appliance of science”.

But what are the scientific principles software engineering is founded on? We might argue that discrete mathematics – set theory, logic, graph theory etc – is a science. And so there’s perhaps some merit, when these theories are applied (e.g., in program verification), in saying that we’re applying science.

But we’re not testing the theories. They are take as a given – as logically proven. And by “logically proven”, I mean logically consistent with all the connected theories. A scientist might argue that proofs aren’t science.

To quote Donald Knuth:

“Beware of bugs in the above code; I have only proved it correct, not tried it.”

Or, in the words of the Second Doctor: “Logic, my dear Zoe, merely enables one to be wrong with authority”.

Just because axioms are logically consistent, that doesn’t mean they’re true. To establish truth, we must defer to reality. We must test them in the real world. In this sense, mathematics is not science. It’s applied philosophy.

And that brings me to the third of the ways that I and others part company. In order to meaningfully test a hypothesis, we must be able to know with high confidence when the data contradicts it. If software engineering ‘s to be truly scientific, our hypotheses need to be refutable.

As computer programmers, we know the challenge of expressing ideas in a way that can’t be misinterpreted. It’s a large part of our work, and the main reason why computer programming remains a minority pursuit. It is hard.

But just like the cognitive dissonance of the anti-science “flying saucer physicist”, many of us hold the contradictory belief that hypotheses about our field of work need not be expressed in any refutable form – they need not be meaningfully testable – even though that’s kind of what we do for a living.

When we combine woolly and untestable claims with small, noisy data sets, comprising mostly of anecdotes, software engineering as a discipline falls well within the territory of UFOlogy.

Now, not everybody subscribes to the idea that for a study to be scientific, it requires hypotheses to be refutable. Physics undergraduates have refutability drummed into us from the start (e.g., Wolfgang Pauli’s “not even wrong” jab at ambiguous claims), and it causes friction with other fields of study that describe themselves as “scientific”, but that lack refutability.

But whether we agree on the definition of “scientific” is not really the important thing here. What matters is where low-signal, largely anecdotal, non-refutable experiments have led us in our understanding of not just what works and what doesn’t in specific situations, but why.

A lot of what we think we know about creating and adapting software is built on the equivalent of UFO reports. Let me give you an example of what can happen when research cuts through that noise.

In their study of developer testing in Java IDEs, researchers discovered that, of the participants who claimed they did Test-Driven Development, analysis of their real IDE activity showed that only about 8% actually did.

The implication here is that 92% of what we think we know about TDD and it’s outcomes is, in reality, based on developers doing something else. Many other studies – on much smaller scales, usually – call me to question whether participants were really doing TDD. And, indeed, whether the authors of the studies could even tell if they weren’t.

The upshot of all this is that when skeptics demand “proof” of the benefits of TDD, even someone with 26 years experience doing and teaching it like me, has to resort to “you’ll just have to take my word for it”. Like the UFO witnesses who “know what they saw”, I know there are real benefits. I just don’t have the hard data to back it up. For every study that finds there are, there’s another one that concludes ¯_(ツ)_/¯

I could survey developers who’ve been doing TDD for, say, more than a year, to ask if they believe there are real benefits. I could ask them if they’d ever consider going back to test-after. (I already have a pretty good idea what the response would be.)

But this is shaky ground. The majority of developers using “AI” coding assistants, for example, believe they’re being more productive. But data on delivery lead times and release stability paints the opposite picture in the majority of cases.

As a teacher and a mentor, the lack of genuine signal in the software engineering body of knowledge makes my job a lot harder.

I have to resort to my powers of persuasion, and I have to rely on people’s willingness to at the very least suspend their disbelief. I did not need to be persuaded that force = mass x acceleration, because the evidence is so compelling.

It also leaves our profession vulnerable to spurious claims that aren’t backed up by credible evidence, but can’t easily be disproved. I might argue that a whole bunch of people’s pensions might be about to be wiped out by a spurious claim about the impact of a particular technology on software teams. Our industry’s very much the rabbit that the GenAI folks are banking on other industries chasing. If programmers don’t get much benefit, what chance lawyers or doctors or teachers?

I appreciate that the complex socio-technical nature of software development presents many challenges to a rigorously scientific approach to gaining useful insights – to learning to predict the effects of pulling certain levers so that we can more confidently engineer the outcomes we want. And I accept that there will always be aspects that remain beyond the scientific method.

However, it feels to me like we’re not even really trying. And we’re so good at making excuses for why we can’t do better.

If there’s one thing we’re not short of as a discipline, it’s hard data. Our activities – like the actions we perform in our IDEs, the code itself, the version histories in our repos, the outputs of builds, the results of testing and linting, the telemetry from production systems – are radiating a rich and long tail of hard data; data about things that actually happened, and not just what we believe or claim happened.

If we were comets, you’d want to fly your probe through that tail.

Again, there are many challenges and problems to be solved, not least of which is the ad hoc, proprietary, non-standardised nature of all that data.

In that sense, we are arguably one of the least mature of the engineering disciplines. My Dad’s architectural CAD system can tell you what order a house has to be built in (you have to do that these days to get planning permission) and can even generate orders for building materials with specific suppliers.

Our tooling workflows are still mostly held together with twigs and tape. And that’s chiefly because we so seldom consider the whole picture when we design development tools – a random landscape of point solutions that don’t play nice with each other.

We lack the data interchange standards of more mature disciplines. And that could well be because we also lack the underlying rigour – including rigour around terminology. How do we standardise things that go by many different names?

But that is a solvable problem. If building design and electronic engineering and mechanical engineering can do it, so can we. Heck, we did it for them! (We suffer from a condition I call “builder’s houses”.)

And if this reads like a bit of a manifesto, so be it. I’m well aware that I’m in a minority who feel this way about software engineering. But if you’re out there thinking along similar lines, maybe drop me a line?