Skip to main content

Posts

Showing posts with the label exploratory testing

Podcast: VW Dieselgate and the $33bn b̶u̶g̶ feature

This is the story behind the VW emissions scandal, that so far has cost the company over $33bn.  We look into the technology issues VW faced and the investigations that uncovered the problem. The MP3 (Audio) file is available here .

Avoiding Wild Goose Chases While Debugging.

When I’m debugging a complex system, I’m constantly looking for patterns. I just ran this test code... What did I see in the log? I just processed a metric $^&*-load of data, did our memory footprint blip? I’m probably using every freedom-unit of screen space to tail logs, run a memory usage tool, run an IDE & debugger, watch a trace of API calls, run test code… And I’m doing this over and over. Then I see it. Bingo, that spike in API calls hits only when that process over there jumps to 20% processor usage when the app also throws that error. Unfortunately, I may have been mistaken. On a sufficiently complex system, the emergent behaviour can approach the appearance of randomness. Combinatorial explosions are for real, and they are happening constantly in your shiny new MacBook. My bug isn’t what I think it is. I’m examining so many variables in a system with dozens of subsystems at play, it's inevitable I will see a correlation. We know this more formally as

A h̶i̶t̶c̶h̶h̶i̶k̶e̶r̶'s̶ software tester's guide to randomised testing - Part 2

How would test a water sac? (Wow there, calm that tester brain... I know what you are thinking, Whats it used for? Who / what uses it? how long does it need to last? Does the temperature of the water matter?  Is it single use? etc. But let's assume a generic hiking or camping water sac for now) I'm guessing one of your suggestions includes filling it with water, shaking it a bit and checking for leaks. Seems kind of obvious right? but when it comes to software, we often do away with old-fashioned techniques such as filling something up and looking at it. Where's the machine learning test algorithm? Call this a BDD scenario? Can Selenium check for H₂0? I have to run this past the B.A... This is your software. We can treat randomly generated test data and inputs in much the same way as water . Data files or other inputs like user interactions are the ever-moving parts of our applications. Think about it, the code is entirely static - it's the state or data that i

How did you find that bug? Are we sitting comfortably, then I'll begin.

How did you find that bug? - They asked with a sort of puzzled " he dun't thunk like uz " look on their faces. An expression that suggested they were unsure whether to commend the discovery or gather their pitchforks and organise a well overdue witch burning. Likewise, I now knew why they needed me. The team members were genuinely hard working people trying to build something new and exciting. But they lacked one thing, someone exploring & asking questions - trying to find out new things about their application. Exploring is literally a step into the unknown, and that can be uncomfortable for those not experienced in how to do it well. Exploring is literally a step into the unknown. So how did I find that bug? It's easy to tell a story of how I tried that particular input value because... Paragraph 3 of v4.6 of the requirements document stated that the user shall indeed on occasion X given input Y in Chrome v62 do... Or spout some other overly verb

The gamification of Software Testing

A while back, I sat in on a planning meeting. Many planning meetings slide awkwardly into a sort of ad-hoc technical analysis discussion, and this was no exception. With a little prompting, the team started to draw up what they wanted to build on a whiteboard. The picture spoke its thousand words, and I could feel that the team now understood what needed to be done. The right questions were being asked, and initial development guesstimates were approaching common sense levels. The discussion came around to testing, skipping over how they might test the feature, the team focused immediately on how long testing would take. When probed as to how the testing would be performed? How we might find out what the team did wrong? Confused faces stared back at me. During our ensuing chat, I realised that they had been using BDD scenarios [only] as a metric of what testing needs to be done and when they are ready to ship. (Now I knew why I was hired to help) There is nothing wrong with c

Thank you for finding the bug I missed.

Thank you to the colleague/customer/product owner, who found the bug I missed. That oversight, was (at least in part) my mistake. I've been thinking about what happened and what that means to me and my team. Giving thanks. It helps us remember. I'm happy you told me about the issue you found, because you... 1) Opened my eyes to a situation I'd never have thought to investigate. 2) Gave me another item for my checklist of things to check in future. 3) Made me remember, that we are never done testing. 4) Are never sure if the application 'works' well enough. 5) Reminded me to explore more and build less. 6) To request that we may wish to assign more time to finding these issues. 7) Let me experience the hindsight bias, so that the edge-case now seems obvious!

Even the errors are broken!

An amused but slightly exasperated developer once turned to me and said "I not only have to get all the features correct, I have to get the errors correct too!". He was referring to the need to implement graceful and useful failure behaviour for his application. Rather than present the customer or user with an error message or stack trace - give them a route to succeed in their goal. E.g. Find the product they seek or even buy it. Bing Suggestions demonstrates ungraceful failure. Graceful failure can take several forms, take a look at this Bing [search] Suggestions bug in Internet Explorer 11. As you can see, the user is presented with a useful feature, most of the time. But should they paste a long URL into the location bar - They get hit with an error message. There are multiple issues here. What else is allowing this to happen to the user? The user is presented with an error message - Why? What could the user possibly do with it? Bing Suggestions does not

VW behaving badly.

I now cover this issue in more detail in my podcast ! The EPA (The US government's Environmental Protection Agency) recently issued Notice of Violations regarding the emissions from Volkswagen cars. Volkswagen is actually a group of brands, therefore the Notice affects other cars such as Audi, Porsche and Skoda. A lot of the focus has been on what was going on in Volkswagen, for example who knew what was being done? Did the VW testers know? Did they pass the details on etc. What interests me is the wider issue of how this could have been possible for so long?  ( Since 2009 )  If so many cars were affected and for so long, why didn’t we hear about this sooner? Why isn’t there a team of people assigned to finding this stuff out... Oh wait, there is... In the UK these emissions tests are governed by the Vehicle Certification Agency , answering to the Department of Transport. One might expect the manufacturer to be less inclined to investigate the cars emissions, after-all te

Testing, Testing, 1, 2, 3.

When I have a spare moment, I usually try and think about how to test something. In fact thats not true, what I do is actually test something. It might be an app on my phone, an online tool, parking-ticket machine or search engine. Usually it is what-ever is to hand, at the time. This is a good way to practice my skills, and can take as long as I have free. In fact having only moments is beneficial, you soon get better at finding out more issues - more quickly. For example, a few moments ago I thought I'd test Google's currency converter. If you haven't seen it, it looks like this: You enter a value and two currencies in the format shown, and Google will give you an answer with great precision. (I haven't examined the accuracy.) Starting from this I varied the text slightly, using "euro" instead of "EUR", also swapping "gbp" and "euro" to see how precedence affected the results. This seemed to behave as expected, but it did

Testing as War?

We are fighting an invincible opponent. The legions of bugs in our software far outnumber our attempts to find them all. Even the simplest of software releases, inevitably contains a 5th column of hidden pre-existing bugs or quirks that combined with our changes could strike at any time. The question we need to understand as testers is, how can we win? or at least: not lose this battle? Military examples and analogies can be useful in software testing, and not just those in reconnaissance . For example: the Millennium Challenge . This pre-gulf war 2 military exercise pitted two forces against one another, in the middle-east. In summary the modern US military was fighting a rogue element in a smaller country. The vast resources of the western power should of have faced few problems. But in fact the former US general  playing the role of the 'Rogue nation' trounced the western forces in a devastating blow that saw several warships sunk. How did the 'rogue' general do

Are you sure you've "completed" testing? A Guardian Content API example.

Testing doesn't complete, it might end, it might finish, but it doesn't complete. There's too much to test. If you ever need confirmation of this, test something, something that's been tested already. Better still test a piece of software, you know has been tested by someone you think is a brilliant tester. A good tester like you, will still find new issues, ambiguities and bugs. That's because the complexity of modern software is huge: as well as all the potential code paths of your code, there's all the other underlying code's paths and the near infinite domain of data it might process. Thats part of the beauty of testing, you have to be able to get a handle on this vast test space. That is, review a near infinite test-space in a [very] finite time-frame. We are unable to give a complete picture of the product to our clients. But we are also free to find out new issues, that have so far eluded others. In fact the consequences are potentially more drama

Is your test automation actually agile? A Guardian Content API example.

In my last post I discussed how test automation could be used to do things that I couldn't easily do unaided. In that example, execute thousands of news 'content searches' and help me sort through them. With the help of some simple test automation I found some potential issues with the results returned by the REST API. In that case, I started out with the aim of implementing a tool. But your testing might not lead you that way, often your own hands-on investigation can find an issue. But you don't know how widespread it is, is it a one-off curiosity? or a sign of something more widespread.? Again, this is where test automation can help, and if done well, without being an implementation or maintenance burden. Many test automation efforts are blind to the very Agile idea of YAGNI or You Ain't Gonna Need it . They often presume to know all that needs to be tested in advance, deciding to invest most of their time writing 'tests' blindly against a specific

Test automation that helps, A Guardian Content API example.

Have you ever had to test an API that's accessible over the internet? or even one thats available internally within your organisation? They often take the form of a REST service (or similar) through which other software can easily access information in a machine readable form. Even if you are not familiar with these APIs, you've probably heard-of or seen the results of them. Some examples of APIs are the Twitter API ,   Flickr  and the Guardian's Open Platform . Some examples of what people have built using the Flickr API are published on the flickr site. Despite being 'machine-readable' they are often human readable, greatly helping you test and debug them. Companies use these APIs to ease the distribution of their content, encourage community and commercial development around their content or to simply provide a clear and documentable line between their role as data-provider and where the consumer's role begins. When testing an API like the above, many

Testing Mindset

Once upon a time there was a young and naive tester, he was new to the world of software testing. He often felt he didn't have what it took to be a tester. Sure, he found the odd bug, and he enjoyed his work, but he also often missed bugs, issues or problems. After a while, he admitted to himself that this was a problem, and decided to seek help. He stood up from his desk and walked over to his test manager's desk. His manager was wise and experienced. He was the Mr Miyagi of testing, and as such was always offering zen-like advice for his team. A simple question about where the stapler had escaped to could turn into a somewhat baffling series of Haiku , leaving our young tester baffled. Our novice explained his problem, and his concerns about how maybe he wasn't cut out for testing. The wise test manager smiled, thought for a moment and then opened his little Moleskine notebook. He turned carefully through the pages, settled on a page, looked up and said: "I over

If it's not good testing, it's not good regression testing either.

Pick a coin from your pocket, and hold it at arms length. Take a good look. Now take another one, of the same denomination and hold it out at arms length as before. Based on your observations alone - can you say they are the identical? Lets go a step further. If someone had given you one coin to look at, then exchanged it for another, could you have determined whether they are the same or different coins? Maybe, yes? If the differences had been large enough e.g. one coin was heavily tarnished or scratched, then the different coins would be identifiable. Or if you'd been given the opportunity to examine the coin using magnifying equipment, you probably could of found differences. But lets assume our only test was a standard set of checks i.e.: viewing at arms length and comparing what we see with our notes/records. It's better than nothing, I would see some differences, some might be important ones. For example if my next coin was blank: I might have suspected an issue with

Into the testing hinterland.

Why do we refer to our ancestors as Cavemen? The evidence of course! The cave paintings, the rubbish piles found in caves all round the world. It's simple, Cavemen lived in caves, they painted on the walls and threw rubbish into the corner of the cave. Thousands of years later we find the evidence, demonstrating they lived in caves. Hence the moniker 'caveman'. How many caves have you seen? Seriously, How many have you seen or even heard of? Now I'm lucky, as former resident of Nottingham [in the UK], I've at least heard of a few . But if you think about it, you probably haven't seen that many. Even assuming you've seen a fair-few, how many were dry, spacious and safe enough for human habitation? As you can guess, my point is: there probably isn't a great selection of prime cave real-estate available. It doesn't add up: The whole of mankind descended from cave [dwelling] men? Before you roll your eyes, and think I'm some sort of Creationist ,