Skip to main content

Wasting your time with Test Automation

Software Testing is essentially an infinitely time consuming task being attempted in a finite time. The 'test space' is almost always vast and near infinite. Your time to test is usually counted in hours. There's an obvious mismatch there. We're testers, we are hired to to help marry the two. We need to find as many issues, and important issues, in that vast test-space in less than a few hours. Thats fine, thats testing, thats what testers work (live?) for.

Roll on Test Automation, our saviour, it can check vast areas of the test space rapidly and efficiently. It can use its 'Data' ( http://en.wikipedia.org/wiki/Data_%28Star_Trek%29 ) like abilities to test the application tirelessly. No? Well Ok, It could 'check' the application tirelessly, for a set of expected results. This itself is potentially valuable, and could examine a range of combinations or test data that we could not be reach alone.

Why do many of the test automation efforts I've witnessed on customer sites, not deliver this? Many customers seem to put in the hard work, they've made a serious time and effort contribution to the problem. So why do they seem to be spending more and more time fixing the tests? slowly lowering their expected 'pass rate'? or re-running the tests until they 'give the right result'.

One major issue is reliability, the 'tests' are just not reliable enough, for what they were intended to do. The large array of checks should be slowly helping the testers more and more by helping them reach those hard to reach parts of the application. But the unreliable nature of the tests mean that each newly implemented check is actually just adding to the signals noise and the maintenance burden.

Lets look at an example, Say we have 300 'automated tests'.

Lets assume these tests are 95% accurate, and only give a false positive 5% of the time.
(This 5% could be down to a plethora of causes such as flakiness in the test-tool itself, problems with the wider system/support systems, network issues, out-of-date 'expected results' - or even poorly written test code.)

Also, lets say the tests are applied in the correct areas and would highlight 15 bugs in the system.
I'll be even more generous and say that if the tests find a bug when testing a buggy area- then there definitely is a bug.
(no false negatives)

That means that the checks will correctly flag the 15 real bugs.
They will also false-flag (300-15) x 5% = 14 fake-bugs

When the checks finish: 14 of 29 or 48% of the results will be false positives or incorrect indications of a bug.

So in summary, even if the tests are 95% reliable (thats high from my experience) then approximately half of the testers work, reviewing the results, will potentially be a waste of time. Time that could be spent looking at the system under test is instead spent looking at flaky test results and bad test code. Those precious few hours of testing are misspent.

The solutions are not as simple as just 'making the tests more reliable'. While having well written code, and good infrastructure can help immensely, the problems tend to be more fundamental. Some of the problems and solutions I've seen are:


  1. The checks are asking binary questions [of complicated systems]. Try giving reports back instead, rather than hard pass/fail results. For example: is a HTTP 302 response a FAIL when you expected a HTTP 200? It might just be that the application has changed. A report covering the actual findings and all the other information that you get for free might be more useful. For example: How long did that response take? what was the size of that response? You could view those results directly or even analyse / graph them as you see fit - looking for patterns/issues. PASS/FAIL checks often seem to be 'change detection' systems rather than 'bug detection' systems.
  2. Keep the checks simple, really simple. Its difficult to write the complicated code needed to handle the various inputs, outputs and state changes a real system undergoes. It's the very reason we find work as testers, we are not immune to the problem of complex code. 
  3. Be aware that these are just 'checks' and all they can do is report. They can never find the 'human stuff'. They can't question or investigate the system. Leave time and resource for using your own testing skill to tackle the system. For example, trying to get your checks to do a visual 'layout' check for example can lead to time-consuming problems. See (4) You, yourself, could perform such a test in seconds, and probably provide better feedback.
  4. Beware of using test automation with GUI's. GUI's are uniquely designed for human use, they:
  •  Update in human-time frames, not at machine speeds - So you may 'check' at the wrong point in time. 
  •  Report information visually, and so use visual effects that can make test automation messier. 
  •  Are often out-of-sync with back-end server systems (Their code runs in a browser or in a separate thread etc) 
  •   Require the test tool to emulate user behaviour when 'automating' a check, programming the keyboard events, mouse clicks/events etc making the test-code more complicated. Look into accessing the System through other more programatic interfaces, for example use the XML, JSON or RMI API etc 'behind' the GUI if its available. You maybe able to check much of the systems logic through this route. And if you can't - You might have found an issue - i.e.: Lots of the application logic is in the GUI, when it might be better off on the server. 


In summary, use the test-code for what its good at, and don't be afraid to report information back to a human who can look for issues. The computer can do the heavy lifting and grunt work, and we can do the smart work of interpreting the feedback.

Comments

  1. Good stuff, Pete. I now have another link to point people to who tell me I should "automate everything" to fix my testing problems. Nice.

    ReplyDelete
  2. The #1 reason I've seen for poor return on investment of automated tests (eg. they take way too much time to maintain, they don't find regression failures) is poor design. Too often, testers without adequate design skills and without good automation frameworks are told to automate tests. They end up with tests that do way too much per script, so that it's hard to pinpoint problems when the test fails. If something changes, they have to change it in 100 places instead of 1 because the test code isn't DRY.

    Rather than give people an excuse to avoid test automation, we should educate the people automating the tests on how to design them well, how to decide what to automate, how to continually refactor the test code for maintainability. Having a programmer and tester pair on automation tasks is ideal.

    My teams have been getting super ROI from tests for more than a decade. On my current team, 7 years after having zero automation, we have several regression suites running at all levels from unit to API to GUI, many times per day, alerting us immediately when something breaks. The tests also provide living documentation - they have to pass, so we have to keep them up to date.

    Please encourage people to learn good ways to automate tests, and don't give the impression that test automation is somehow a bad thing. That's probably not what you mean to say, but people looking for a reason not to have to learn something that's hard for them may interpret it that way.

    ReplyDelete

Post a Comment

Popular posts from this blog

The gamification of Software Testing

A while back, I sat in on a planning meeting. Many planning meetings slide awkwardly into a sort of ad-hoc technical analysis discussion, and this was no exception. With a little prompting, the team started to draw up what they wanted to build on a whiteboard.

The picture spoke its thousand words, and I could feel that the team now understood what needed to be done. The right questions were being asked, and initial development guesstimates were approaching common sense levels.

The discussion came around to testing, skipping over how they might test the feature, the team focused immediately on how long testing would take.

When probed as to how the testing would be performed? How we might find out what the team did wrong? Confused faces stared back at me. During our ensuing chat, I realised that they had been using BDD scenarios [only] as a metric of what testing needs to be done and when they are ready to ship. (Now I knew why I was hired to help)



There is nothing wrong with checking t…

A h̶i̶t̶c̶h̶h̶i̶k̶e̶r̶'s̶ software tester's guide to randomised testing - Part 1

Mostly Harmless, I've talked and written about randomisation as a technique in software testing several times over the last few years. It's great to see people's eyes light up when they grok the concept and its potential. 
The idea that they can create random test data on the fly and pour this into the app step back and see what happens is exciting to people looking to find new blockers on their apps path to reliability.
But it's not long before a cloud appears in their sunny demeanour and they start to conceive of the possible pitfalls. Here are a few tips on how to avert the common apparent blockers. (Part 1) Problem: I've created loads of random numbers as input data, but how will I know the answer the software returns, is correct? - Do I have to re-implement the whole app logic in my test code?
Do you remember going to the fun-fair as a kid? Or maybe you recall taking your kids now as an adult? If so then you no doubt are familiar with the height restriction -…

How did you find that bug? Are we sitting comfortably, then I'll begin.

How did you find that bug? - They asked with a sort of puzzled "he dun't thunk like uz" look on their faces. An expression that suggested they were unsure whether to commend the discovery or gather their pitchforks and organise a well overdue witch burning.

Likewise, I now knew why they needed me. The team members were genuinely hard working people trying to build something new and exciting. But they lacked one thing, someone exploring & asking questions - trying to find out new things about their application. Exploring is literally a step into the unknown, and that can be uncomfortable for those not experienced in how to do it well.
So how did I find that bug? It's easy to tell a story of how I tried that particular input value because... Paragraph 3 of v4.6 of the requirements document stated that the user shall indeed on occasion X given input Y in Chrome v62 do... Or spout some other overly verbose explanation of why that broken 'scenario' came to be…