Skip to main content

If it's not good testing, it's not good regression testing either.

Pick a coin from your pocket, and hold it at arms length. Take a good look. Now take another one, of the same denomination and hold it out at arms length as before. Based on your observations alone - can you say they are the identical?

Lets go a step further. If someone had given you one coin to look at, then exchanged it for another, could you have determined whether they are the same or different coins? Maybe, yes? If the differences had been large enough e.g. one coin was heavily tarnished or scratched, then the different coins would be identifiable. Or if you'd been given the opportunity to examine the coin using magnifying equipment, you probably could of found differences.

But lets assume our only test was a standard set of checks i.e.: viewing at arms length and comparing what we see with our notes/records. It's better than nothing, I would see some differences, some might be important ones. For example if my next coin was blank: I might have suspected an issue with my coin supply, and investigated.

What about my next coin... it is blank on one side. Unfortunately it's not the side I check when I hold it at arms length. So as far as my checks are concerned there has been no regression in the quality of the coins being produced by my pocket. So until I go 'live' and try and spend my coins out in the real world of shopkeepers, I'm none the wiser.

Do you see the flaw in our logic here? If we noticed a degradation in coin quality the testing is good. If the testing does not find an issue, it still must be good, because previously those checks found a different issue. Because I was only performing one test or one set of tests I was blind to issues that I can't see with that one test.

If we'd been testing the coins independently, we probably would of been more critical. We might of thought: sure it looks good in the arm length test, what about the weight: maybe thats wrong. We'd try a number of different tests trying to find an issue. We'd ask other people about coins, learn about their two sided nature and perform tests for it.

But as soon as we enter 'regression testing' mode, we often start to disregard this behaviour and start to mindlessly run the same tests. We avoid exploration, sometimes without noticing. Sometimes people actively avoid exploration during regression testing thinking it's inappropriate. This approach would assume that the test you have been running is some kind of super-observer, capable of helping you to see all problems.

If the system has changed significantly, with the addition or removal of complex behaviours, surely the tests might not also need to adapt? The assumption that the same test will somehow catch a change in functionality, reliability etc is based on the premise that our super-test was testing everything -before- and still is. As testers we know it didn't, doesn't and never will be that super-test. We need to adapt to each new release in an attempt to find new issues. If our tests aren't finding an issue, it's just as possible that the tests are ineffective as it is that the system isn't defective.

Comments

Popular posts from this blog

What possible use could Gen AI be to me? (Part 1)

There’s a great scene in the Simpsons where the Monorail salesman comes to town and everyone (except Lisa of course) is quickly entranced by Monorail fever… He has an answer for every question and guess what? The Monorail will solve all the problems… somehow. The hype around Generative AI can seem a bit like that, and like Monorail-guy the sales-guy’s assure you Gen AI will solve all your problems - but can be pretty vague on the “how” part of the answer. So I’m going to provide a few short guides into how Generative (& other forms of AI) Artificial Intelligence can help you and your team. I’ll pitch the technical level differently for each one, and we’ll start with something fairly not technical: Custom Chatbots. ChatBots these days have evolved from the crude web sales tools of ten years ago, designed to hoover up leads for the sales team. They can now provide informative answers to questions based on documents or websites. If we take the most famous: Chat GPT 4. If we ignore the

Is your ChatBot actually using your data?

 In 316 AD Emperor Constantine issued a new coin,  there's nothing too unique about that in itself. But this coin is significant due to its pagan/roman religious symbols. Why is this odd? Constantine had converted himself, and probably with little consultation -  his empire to Christianity, years before. Yet the coin shows the emperor and the (pagan) sun god Sol.  Looks Legit! While this seems out of place, to us (1700 years later), it's not entirely surprising. Constantine and his people had followed different, older gods for centuries. The people would have been raised and taught the old pagan stories, and when presented with a new narrative it's not surprising they borrowed from and felt comfortable with both. I've seen much the same behaviour with Large Language Models (LLMs) like ChatGPT. You can provide them with fresh new data, from your own documents, but what's to stop it from listening to its old training instead?  You could spend a lot of time collating,

Can Gen-AI understand Payments?

When it comes to rolling out updates to large complex banking systems, things can get messy quickly. Of course, the holy grail is to have each subsystem work well independently and to do some form of Pact or contract testing – reducing the complex and painful integration work. But nonetheless – at some point you are going to need to see if the dog and the pony can do their show together – and its generally better to do that in a way that doesn’t make millions of pounds of transactions fail – in a highly public manner, in production.  (This post is based on my recent lightning talk at  PyData London ) For the last few years, I’ve worked in the world of high value, real time and cross border payments, And one of the sticking points in bank [software] integration is message generation. A lot of time is spent dreaming up and creating those messages, then maintaining what you have just built. The world of payments runs on messages, these days they are often XML messages – and they can be pa