Skip to main content

Testing a maybe with machine learning.

“I figured it was just a jumbo jet.”

My son and I shake our heads & then adopt blank stares as if a non-body-snatcher has been
exposed in our midst.

“Twin engine,” I utter, as I glance skyward again.

“Single decker” My son adds as an explanation.

“It’s a plane”, she retorts, rolling her eyes.

My wife, (who is far smarter than myself) lacks my son and I’s ability to recognise aircraft. She has the typical persons ability to recognise aeroplanes. I grew up around airforce bases. I had a father who was an aircraft engineer. Years of exposure and explanations regarding aeroplanes, their mechanics and features.

Image result for identify airplanes
We took this to the next level...
My son is an avid flight sim game player and has consumed many hours of relevant youtube material on the subject. He also had the luck/misfortune of me discussing the planes that frequent the skies, above us here, near London.

Given our combined experience & expertise, we probably have a reasonable ability to recognise the make & model of the planes we see in the sky. I would have given us, combined, an accuracy rate of say 90%.

A while back I worked on a Deep Learning classification model that could recognise types of aircraft from pictures. It was fairly crude and managed an 80% accuracy rate against a test set of aircraft images. 

Is that good? That would depend on a lot of things:
  • The purpose of the tool
  • Alternative systems’ performance
  • The existing recognition system in place
  • Risks the new system might expose a company to
  • The cost
  • The time/context when the device was used
  • Ability to update the ‘model’ as new planes are released
  • Etc
Those questions are probably similar to those you would be asking regarding any software you find yourself testing. 

One subtlety that isn't always present in many software systems is the explicit accuracy. Often when a software system is performing a calculation or logical process, we assume 100% accuracy and test for it. That often makes sense, but sometimes those figures and calculations are inherently inaccurate from the user’s point of view. They are just simplified models of real-world systems. 

Contrary to much of the negative publicity Machine Learning receives, accuracy can often be measured directly with the model/tool. A machine learning approach could make the assumed inaccuracy of the tool, more explicit.

For example, we could get an overall idea of how accurate the model was (E.g.: 80% for our aircraft model) but also how sure it is (AKA the probability) of each answer it outputs (e.g.: its 95% sure its an Airbus A380, and 50% sure its a Boeing 747).

Furthermore, the data upon which it is trained can be defined and recorded for later review and analysis. This could be by programmers, testers, product owners, lawyers. prospective customers etc. That's not always easy to do if the existing system is a person.

As a tester, you might also locate or create your own data to more thoroughly test the system. Checking for edge cases and real-world situations that may have not already been modelled. E.g. What does the model classify a flock of Canada Geese as? Given realistic data, is the model biased towards giving Airbus planes a higher score? (we could test that...)

We can approach these systems as a form of mechanised heuristics. While they work slightly differently to our human-ware heuristics, they behave in a similar way. They are fallible, they are useful shortcuts that can really help in many situations.

For example, they could be replicated and deployed at will in a manner that existing people or systems can not. Will the product work better or more efficiently than the existing approach? (The answer, for example, it could be that it's less accurate than an existing person, but the unit cost is much lower - and so overall efficiency wins on a larger deployment)

While the roles being automated will still exist, there will be fewer people doing them. E.g.: Why have ten aircraft spotters when 2 who are skilled in both identifying the UFOs and updating the machine learning models might fit the business needs more efficiently?

As we continue to mechanise more business functions using machine learning, Software Testers will need to start thinking a little more in terms of comparing accuracy rather than a brittle approach of binary pass/fail correctness. Given our industry’s struggles with the Pass/Fail mentality, I suspect this will be one of our greatest challenges in the coming years.

Comments

  1. Well, here's a test. How does your system cope with something it cannot possibly be seeing as a real-world input - say, a Handley Page HP.42? And how would it deal with something that it might theoretically see, but the likelihood of seeing it is quite remote - say, a Dakota, or a Junkers Ju.52? Would it attempt a close match, or just reject input sightings that did not match the tool's parameters?

    Context is important here. Being 100% accurate isn't that important if the app is only going to be used by hobbyists. But if you were marketing it to the military, 100% accurate identification becomes much more important. IFF - Identification Friend or Foe - has been around for a long time, as I'm sure I don't need to tell you!

    ReplyDelete

Post a Comment

Popular posts from this blog

The gamification of Software Testing

A while back, I sat in on a planning meeting. Many planning meetings slide awkwardly into a sort of ad-hoc technical analysis discussion, and this was no exception. With a little prompting, the team started to draw up what they wanted to build on a whiteboard.

The picture spoke its thousand words, and I could feel that the team now understood what needed to be done. The right questions were being asked, and initial development guesstimates were approaching common sense levels.

The discussion came around to testing, skipping over how they might test the feature, the team focused immediately on how long testing would take.

When probed as to how the testing would be performed? How we might find out what the team did wrong? Confused faces stared back at me. During our ensuing chat, I realised that they had been using BDD scenarios [only] as a metric of what testing needs to be done and when they are ready to ship. (Now I knew why I was hired to help)



There is nothing wrong with checking t…

A h̶i̶t̶c̶h̶h̶i̶k̶e̶r̶'s̶ software tester's guide to randomised testing - Part 1

Mostly Harmless, I've talked and written about randomisation as a technique in software testing several times over the last few years. It's great to see people's eyes light up when they grok the concept and its potential. 
The idea that they can create random test data on the fly and pour this into the app step back and see what happens is exciting to people looking to find new blockers on their apps path to reliability.
But it's not long before a cloud appears in their sunny demeanour and they start to conceive of the possible pitfalls. Here are a few tips on how to avert the common apparent blockers. (Part 1) Problem: I've created loads of random numbers as input data, but how will I know the answer the software returns, is correct? - Do I have to re-implement the whole app logic in my test code?
Do you remember going to the fun-fair as a kid? Or maybe you recall taking your kids now as an adult? If so then you no doubt are familiar with the height restriction -…

Betting in Testing

“I’ve completed my testing of this feature, and I think it's ready to ship”
“Are you willing to bet on that?”
No, Don't worry, I’m not going to list various ways you could test the feature better or things you might have forgotten.
Instead, I recommend you to ask yourself that question next time you believe you are finished. 
Why? It might cause you to analyse your belief more critically. We arrive at a decision usually by means of a mixture of emotion, convention and reason. Considering the question of whether the feature and the app are good enough as a bet is likely to make you use a more evidence-based approach.

Why do I think I am done here? Would I bet money/reputation on it? I have a checklist stuck to one of my screens, that I read and contemplate when I get to this point. When you have considered the options, you may decide to check some more things or ship the app. Either could be the right decision.
Then the app fails…
The next day you log on and find that the feature is b…