Skip to main content

Wrong end of the stick

There's a story about air-force scientists during world war 2, that reflects an interesting concept about the things we see and how they can alter our assumptions. The story goes that the allied bombers were suffering great losses during their air-raids of continental europe. The allied scientists got together and anaylsed the damage reports from the engineers tasked with fixing the planes after each raid. (One of the scientists working on these problems was Abraham Wald )

Here is an example of the sort of summary engineering reports they might of been faced with. The report details the parts of the plane and what proportion of aeroplanes had been damaged in that area: (This data is completely made up by me):
15% had damage to 1 or more engines
25% had tail damage
25% had damage to the nose and cockpit area.
35% had damage to the fuselage

The aircraft engineers could only add extra-armour to one part of the plane, any more armour would limit the aeroplane in other ways e.g.: making it an easier target or unable to carry its deadly-cargo. Where would you add the armour? If you wanted to do your best to ensure that plane and its crew returned, where would you place the bet?

The story goes that the answer relies on 2 more pieces of information. Firstly, the flak could affect any part of the aero-plane, and didn't tend to always affect one part more than another. The second, was that the engineers data is not the full picture. It suffers from a [literal] survivorship bias. What about the planes that didn't come back? What parts of the aircraft are not listed in the engineers reports?

For example, the wings are not mentioned above. The idea is that the most critically damaged aircraft never made it back to the engineers. These would never be recorded in the statistics, and so the damage reports tended to show an almost inverted view of what needed to be armoured. That is, if a plane received damage to its wings - it never came home. The wings needed the armour most.

This is a situation I've witnessed in software testing. The phenomenon can exhibit itself in many ways. For example a simple mis-use of metrics, does feature X have 10 bug reports recorded against it? but but feature Y has just 2? Maybe feature Y isn't less-broken but so broken that no-one can use-it well enough to find more bugs. While the 'buggy' feature Y is popular and receives a lot of attention from its users, reporting the quirks and bugs they see.

A more subtle example might be, in a performance test, one server appears to display fewer errors. Maybe that server has the 'right' configuration, or its hardware is better: lets make all our servers like the 'good-one'. But it could be that this server is mis-configured or mis-managed in some way. Perhaps its not taking its fair-share of the load - forcing an overload on the other servers. In this case approaching the results skeptically might in fact save you from mis-interpreting the results, and propagating a 'bad configuration' just because it seemed to help in one scenario.

For a tester, the simple heuristic that your apparent results are just that: apparent, to you. They may in fact represent, as above, an entirely 'negative' image of how the software is actually behaving. Its worth spending some time testing your tests, because how do you know you haven't got the 'wrong end of the stick'?

I hope I haven't trivialised an important albeit dark aspect of european history with this post. I hope I have helped to use the information learned for a better purpose. For those interested in some of the effects of the allied bombing on continental Europe you wish to start reading about the Bombing of Dresden. You may also find articles concerning The Blitz of interest.


Popular posts from this blog

Why you might need testers

I remember teaching my son to ride his bike. No, Strike that, Helping him to learn to ride his bike. It’s that way round – if we are honest – he was changing his brain so it could adapt to the mechanism and behaviour of the bike. I was just holding the bike, pushing and showering him with praise and tips.
If he fell, I didn’t and couldn’t change the way he was riding the bike. I suggested things, rubbed his sore knee and pointed out that he had just cycled more in that last attempt – than he had ever managed before - Son this is working, you’re getting it.
I had help of course, Gravity being one. When he lost balance, it hurt. Not a lot, but enough for his brain to get the feedback it needed to rewire a few neurons. If the mistakes were subtler, advice might help – try going faster – that will make the bike less wobbly. The excitement of going faster and better helped rewire a few more neurons.
When we have this sort of immediate feedback we learn quicker, we improve our game. When the f…

Thank you for finding the bug I missed.

Thank you to the colleague/customer/product owner, who found the bug I missed. That oversight, was (at least in part) my mistake. I've been thinking about what happened and what that means to me and my team.

I'm happy you told me about the issue you found, because you...

1) Opened my eyes to a situation I'd never have thought to investigate.

2) Gave me another item for my checklist of things to check in future.

3) Made me remember, that we are never done testing.

4) Are never sure if the application 'works' well enough.

5) Reminded me to explore more and build less.

6) To request that we may wish to assign more time to finding these issues.

7) Let me experience the hindsight bias, so that the edge-case now seems obvious!

Being a square keeps you from going around in circles.

After a weary few hours sorting through, re-running and manually double checking the "automated test" results, the team decide they need to "run the tests again!", that's a problem to the team. Why? because they are too slow. The 'test' runs take too long and they won't have the results until tomorrow.
How does our team intend to fix the problem? ... make the tests run faster. Maybe use a new framework, get better hardware or some other cool trick. The team get busy, update the test tools and soon find them selves in a similar position. Now of course they need to rewrite them in language X or using a new [A-Z]+DD methodology. I can't believe you are still using technology Z , Luddites!
Updating your tooling, and using a methodology appropriate to your context makes sense and should be factored into your workflow and estimates. But the above approach to solving the problem, starts with the wrong problem. As such, its not likely to find the right ans…