Skip to main content

Wrong end of the stick

There's a story about air-force scientists during world war 2, that reflects an interesting concept about the things we see and how they can alter our assumptions. The story goes that the allied bombers were suffering great losses during their air-raids of continental europe. The allied scientists got together and anaylsed the damage reports from the engineers tasked with fixing the planes after each raid. (One of the scientists working on these problems was Abraham Wald )

Here is an example of the sort of summary engineering reports they might of been faced with. The report details the parts of the plane and what proportion of aeroplanes had been damaged in that area: (This data is completely made up by me):
15% had damage to 1 or more engines
25% had tail damage
25% had damage to the nose and cockpit area.
35% had damage to the fuselage

The aircraft engineers could only add extra-armour to one part of the plane, any more armour would limit the aeroplane in other ways e.g.: making it an easier target or unable to carry its deadly-cargo. Where would you add the armour? If you wanted to do your best to ensure that plane and its crew returned, where would you place the bet?

The story goes that the answer relies on 2 more pieces of information. Firstly, the flak could affect any part of the aero-plane, and didn't tend to always affect one part more than another. The second, was that the engineers data is not the full picture. It suffers from a [literal] survivorship bias. What about the planes that didn't come back? What parts of the aircraft are not listed in the engineers reports?

For example, the wings are not mentioned above. The idea is that the most critically damaged aircraft never made it back to the engineers. These would never be recorded in the statistics, and so the damage reports tended to show an almost inverted view of what needed to be armoured. That is, if a plane received damage to its wings - it never came home. The wings needed the armour most.

This is a situation I've witnessed in software testing. The phenomenon can exhibit itself in many ways. For example a simple mis-use of metrics, does feature X have 10 bug reports recorded against it? but but feature Y has just 2? Maybe feature Y isn't less-broken but so broken that no-one can use-it well enough to find more bugs. While the 'buggy' feature Y is popular and receives a lot of attention from its users, reporting the quirks and bugs they see.

A more subtle example might be, in a performance test, one server appears to display fewer errors. Maybe that server has the 'right' configuration, or its hardware is better: lets make all our servers like the 'good-one'. But it could be that this server is mis-configured or mis-managed in some way. Perhaps its not taking its fair-share of the load - forcing an overload on the other servers. In this case approaching the results skeptically might in fact save you from mis-interpreting the results, and propagating a 'bad configuration' just because it seemed to help in one scenario.

For a tester, the simple heuristic that your apparent results are just that: apparent, to you. They may in fact represent, as above, an entirely 'negative' image of how the software is actually behaving. Its worth spending some time testing your tests, because how do you know you haven't got the 'wrong end of the stick'?

I hope I haven't trivialised an important albeit dark aspect of european history with this post. I hope I have helped to use the information learned for a better purpose. For those interested in some of the effects of the allied bombing on continental Europe you wish to start reading about the Bombing of Dresden. You may also find articles concerning The Blitz of interest.

Comments

Popular posts from this blog

Can Gen-AI understand Payments?

When it comes to rolling out updates to large complex banking systems, things can get messy quickly. Of course, the holy grail is to have each subsystem work well independently and to do some form of Pact or contract testing – reducing the complex and painful integration work. But nonetheless – at some point you are going to need to see if the dog and the pony can do their show together – and its generally better to do that in a way that doesn’t make millions of pounds of transactions fail – in a highly public manner, in production.  (This post is based on my recent lightning talk at  PyData London ) For the last few years, I’ve worked in the world of high value, real time and cross border payments, And one of the sticking points in bank [software] integration is message generation. A lot of time is spent dreaming up and creating those messages, then maintaining what you have just built. The world of payments runs on messages, these days they are often XML messages – and they ...

What possible use could Gen AI be to me? (Part 1)

There’s a great scene in the Simpsons where the Monorail salesman comes to town and everyone (except Lisa of course) is quickly entranced by Monorail fever… He has an answer for every question and guess what? The Monorail will solve all the problems… somehow. The hype around Generative AI can seem a bit like that, and like Monorail-guy the sales-guy’s assure you Gen AI will solve all your problems - but can be pretty vague on the “how” part of the answer. So I’m going to provide a few short guides into how Generative (& other forms of AI) Artificial Intelligence can help you and your team. I’ll pitch the technical level differently for each one, and we’ll start with something fairly not technical: Custom Chatbots. ChatBots these days have evolved from the crude web sales tools of ten years ago, designed to hoover up leads for the sales team. They can now provide informative answers to questions based on documents or websites. If we take the most famous: Chat GPT 4. If we ignore the...

Manumation, the worst best practice.

There is a pattern I see with many clients, often enough that I sought out a word to describe it: Manumation, A sort of well-meaning automation that usually requires frequent, extensive and expensive intervention to keep it 'working'. You have probably seen it, the build server that needs a prod and a restart 'when things get a bit busy'. Or a deployment tool that, 'gets confused' and a 'test suite' that just needs another run or three. The cause can be any number of the usual suspects - a corporate standard tool warped 5 ways to make it fit what your team needs. A one-off script 'that manager' decided was an investment and needed to be re-used... A well-intended attempt to 'automate all the things' that achieved the opposite. They result in a manually intensive - automated process, where your team is like a character in the movie Metropolis, fighting with levers all day, just to keep the lights on upstairs. Manual-automation, manu...