Investigating Software

Investigating Software

Tuesday, 2 August 2016

Synecdoche

A common but often unnoticed figure of speech is the synecdoche. When I say “Beijing opened its borders”. We know I mean “The People's Republic of China has opened its borders.”) That’s a Synecdoche, in this case I named part of something (Beijing) to mean the whole (P.R.C.).

Conversely, I might say “Westminster is in turmoil” when anyone with knowledge of British politics will know I mean, “The politicians in the Houses of Parliament are in turmoil”. The reader will know I am not referring to The City of Westminster, a region of London. (Or the place in Canada etc.)

Synecdoche can be a useful and illustrating tool of conversation. Helping to convey the size or importance of the subject or illustrate in more detail a subtlety of the situation. For example: “Beijing opened its Borders” also indicates the power of that country's central government. Some residents of one city in China, can open [or close] the borders of a vast country spanning thousands of miles and comprising over 1.3 billion people.

Synecdoche can also lead to ambiguity, and are particularly dependant on context. For example the same phrase “Westminster is in turmoil” accompanied by a picture of a de-railed train, smoke and ambulances would lead the reader to assume the geographic region of Westminster was being referred to.

Just this sort of language and potential for confusion exists within software development. For example, a Product Owner might ask a team to code a feature for her App. A technical lead would likely know her team will actually: analyse, converse, script, code, test, fix, report, document, review etc. And probably do this across multiple systems before she can agree with the Product Owner that the App’s feature is complete or ‘coded’.

Why don’t technical leads get annoyed by this narrow description of the work? Well, actually they do, all the time. When working as a Scrum Master and Program Manager I frequently had to smooth these sorts of negotiations. Often a technical lead or test lead would take the product owners choice of the word (e.g.: “code” or “develop”) to mean that the work required was not significant. When the Product Owner’s words could have been translated as “do clever stuff to make it happen”.

Product Owners were often not from a programming or testing background. Occasionally they would not use the same jargon as developers or, more often, they used the same terms but with their own meanings. For example, using ‘code’ to mean the whole software development and release process.

While some friction would be caused in circumstances where someone might use the wrong or, to the team, misleading jargon, the team usually adapted. The team might use the jargon between themselves, but then adopt a less ‘technical’ (their words) language style when talking to others. That is, people outside the core team.

Testing also has situations where we frequently say one thing, and rely on context to mean so much more or less. ‘Test automation’ for example. This simple term can covers a range of tools, techniques and even approaches.

In my experience, ‘Test automation’ has for example referred to test data generators or shell scripts. These would check data-outputs were within a valid range, given data-inputs of historical purchases. I have also worked with successful teams where the term test automation meant random input generators combined with a simple run-until-crash check.

Furthermore, I have worked on systems where ‘Test Automation’ results could be red / green / pass / fail style messages reported from a GUI or API based test tool. In another team our results could only have  been usefully discerned with the aid of graphing software. On some projects the skilled expertise of a statistician was required to decide whether our test code had uncovered an issue. On occasion, the term 'Test Automation' could mean several or all of the above.

When talking with my team, I need to be more specific. I, like them, have to be able to describe what I’m doing and why. I could just say “I’m doing test automation” but that would be like a developer stating “I’m doing feature X”.  Having a precise way to describe my work, and how it relates to the work of my team members is valuable and time saving. Not just in the time spent not re-explaining and clarifying concepts. But more importantly, not having to re-do things we thought were complete or correct the first time.

Having the words to describe in detail our work is invaluable. The sorts of things we talk about within a team are jargon heavy E.g.: I need to explain to my team that I’m coding a check for the products UTF-16 surrogate pair handling, to be added to the Continuous Integration process, this might mean we don’t complete a feature this sprint. I may need to clarify that I’m writing a script to be used as an oracle -  to aid our User Interface testing, or ask the programmers to include a testability hook to aid our log file analysis.

The language used to communicate these ideas is important. The language and terms themselves are worthy of at least some discussion. If we as a team are unfamiliar with the terms. Or their differing contextual meanings, we will likely end up very confidently and quietly not knowing what we are doing all day.

Monday, 21 December 2015

Your software sucks (any data you give it)

At 1524h, On the afternoon of January 15th 2009, US Airways Flight 1549 was cleared for takeoff from Runway 4 at New York's La Guardia airport. The airplane carried 150 passengers and 5 flight crew, on a flight to Charlotte Douglas, North Carolina. The Airbus A320's twin CFM56 engines had been serviced just over a month prior to the flight. The plane climbed to a height of 859m (2818 feet) before disaster struck.

Passengers reported hearing several loud bangs and then flames being visible from the engines' exhaust. Shortly thereafter the 2 engines shut-down, robbing the Airbus of thrust and its primary source of electrical power.

At this point the Captain took over from the First officer and between them they spent the next 3 minutes both looking for somewhere to land, while also desperately trying to restart their aircraft's engines.

What Happened?

A flock of birds had crossed the path of the Airbus and several had struck the plane. Both engines had ingested birds and shut-down as a result. A shut-down is the FAA required minimum standard behaviour for a Jet engine.

An Emirates engine after a bird strike.
The safe automatic shut-down of a jet engine is a scenario tested for by engine manufacturers before they can be certified for use.

Worse things might happen, e.g. the broken unbalanced blades might continue blowing air into the fuel rich combustion chamber while red-hot engine fragments are jettisoned outwards into other parts of the fuel-laden plane.

Viewed in that light: a graceful shut-down is not a bad minimum safety requirement.

If we think about it, jet engines need a good deal of testing, after-all they
  • Are mission critical.
  • Work faster than humans can think and react.
  • Are expensive and time consuming to build.
  • Have to be integrated with other complex systems
  • Have to accept un-validated inputs (like birds)
Does any of that sound familiar? That last one in particular is relevant to the field of software development and testing.

Un-validated input? How do they test that?


One of the tests that can be performed on a Jet Engine is to fire frozen poultry into the engine. The engine ingests a turkey at high speed, in an attempt to simulate a bird being sucked into the engine during flight.

Like many technical systems that deal directly with the outside world, software can have serious problems when exposed to unusual inputs. Like the Jet, the point of ingest literally can not be protected - something has to 'process' what’s coming in.

As software testers working with applications that need to handle these situations, we need to learn how to perform our own frozen-turkey-tests and examine how our complex systems handle them.
  • Do they crash?
  • If they crash, is that OK? 
  • What have I learned?
  • What were the side effects?
  • Can I restart it? or is it now 'corrupted' ?
  • What is the likelihood of failure?
The sort of websites we use every day have to accept largely un-validated inputs, they are on the internet and anything our computer can send - they have to deal with.

But surely its just text, right?
If its not 'normal' block it!

That isn’t going to work for long... For example Google has to handle anything you want to find on any website. Even if you accidentally include some right-to-left data in your search:




...Or you want to find out how to do that cool Emoticon on your new Microsoft Surface notebook keyboard... Microsoft.com then needs to handle that query.


...Or you don't want to pay extra on your phone bill just because you used a smiley face in your text message.

These are real world examples of things people use their software for, every day. Hence they are the sort of things we need to test for, lest our users end up going elsewhere or find they are being over charged.

Tools such as No More ASCII can help us test websites, by giving us direct access to a range of Unicode 'code-points' that may cause problems for our software.

The problems can be subtle, more than just something 'not looking right'. The complex nature in which languages are represented in your application can mean that simple things such as measuring the length of a string can fail. (string.online-toolz.com)




Sorting can also fail. If your text is reversed for example if may not render correctly afterwords:


These 2 issues are caused by the website not being able to properly process Unicode text, in particular the UTF-16 flavour of Unicode. Some characters (or Graphemes as they are called in Unicode) are in fact made-up of 2 parts or 'code-points'. So whilst many characters tend to be 1 code-point, some are pairs.  These pairs are referred to as 'surrogate pairs'.

Why does the reverse-string function fail? It appears to be putting the emoticons 2 code-points in reverse order, when it shouldn't. They should be treated together as one character when a reverse or sort is performed. (When the individual code-points in a surrogate pair are swapped, they become meaningless).

How to reverse a UTF-16 text string with a Surrogate Pair in it.

These 'surrogate pairs' cover things like Emoticons or Musical notation etc. While not used widely on computers in North America in the 1960s, therefore  not in ASCII, they are now widely used all around the globe.

Un-validated text input is great example of where tools-assisted-testing can discover a wealth of knowledge about our applications. Given the wide domain of possible inputs and unknown-complexity of the app, this is an inherently exploratory process. Have the right tools on-hand helps you gain that knowledge quicker.

You can read more about how to explore how your browser/app handles Unicode.

Thursday, 10 December 2015

Even the errors are broken!

An amused but slightly exasperated developer once turned to me and said "I not only have to get all the features correct, I have to get the errors correct too!". He was referring to the need to implement graceful and useful failure behaviour for his application.

Rather than present the customer or user with an error message or stack trace - give them a route to succeed in their goal. E.g. Find the product they seek or even buy it.
Bing Suggestions demonstrates ungraceful failure.

Graceful failure can take several forms, take a look at this Bing [search] Suggestions bug in Internet Explorer 11.

As you can see, the user is presented with a useful feature, most of the time. But should they paste a long URL into the location bar - They get hit with an error message.

There are multiple issues here. What else is allowing this to happen to the user? The user is presented with an error message - Why? What could the user possibly do with it? Bing Suggestions does not fail gracefully.
I not only have to get all the features correct, I have to get the errors correct too!  -Developer
In this context, presenting the user with an error message is a bug, probably worse than the fact the suggestions themselves don't work. If they silently failed - the number users who were consciously affected would probably be greatly reduced.

By causing the software to fail, we often appear to be destructive, but again we are learning more about the application, through its failure. Handling failures gracefully is another feature of the software that is important to real users - in the real world. The user wants to use your product to achieve their goal. They don't want to see every warning light that displays in the pilot's cockpit. Just tell them if they need to put their seat-belt on.

Monday, 7 December 2015

Counting Images, a FireFox Add-on

Many of my clients ask me to test their content management and processing systems. Often this involves investigating how the software handles images of various sizes as well as text of various lengths or types.

To help create test-images, I created this little FireFox Add-on. The Counting Images add-on starts with one click and can be used to create an image of a custom size.

For example: if you need a 300x250 MPU advert image - just enter 300 and 250 into the panel and click Create Image. To download the image, just click on it - as you would a a link and choose Save.

The image files are named widthxheight.png, and include markings to help identify if they have been truncated e.g.:



The marked numbers refer to the size in pixels of the rectangle they are in. E.g.: the blue rectangle (always the outermost one) is 150x100 pixels in size.

Another example:
As you can see the rectangles start at the defined size and count down in steps of 20 pixels.

What could go wrong? Well a good example is very thing and tall images. The image edge might actually truncate the text specifiying its height e.g.:


 The image here is 30x1001 but the narrowness means the visible text is 30x100.
 


Friday, 4 December 2015

Learning from the Boeing 787's broken software.

Earlier this year Boeing 787 engineers were given some new instructions by the FAA (The US government's: Federal Aviation Authority). They were informed that if the aeroplane's electrical generators were left running for 248 days, they would enter fail-safe mode. 

In plain English: they will stop producing electrical power. This short video looks into why that might be and how this information can hep us test our software.




Tuesday, 1 December 2015

Bug Automation

In many of my clients, more effort is spent on 'test automation' than on other forms of testing or quality assurance. That can be the right choice, for example, I worked on a Data Warehousing project where we needed to write some test automation before we could test the data and its processing.

Many other projects in different technology areas also spend a lot of time on their test automation. To be precise, they spend an increasing amount of time fixing & maintaining old 'tests' and 'frameworks'.

There are great tools around to help us write these automated checks quickly. But as with many software systems: maintenance, in the long term, is where the time and money goes. That is why I'm surprised we don't use short term automation more. We have the skills.

One good example of short term automation is Bug Automation. A simple script / executable that recreates or demonstrates a bug. This isn't a new idea, I've been doing it for years and I know other people have to.

Its common on open source projects to report an issue with example code, to clarify the exact issue you are reporting. Its a quick way to demonstrate the issue.

I'm not referring here to the idea of building a regression test suite from 'tests' (checks) from each bug fix. You can do that if you want, It can be very useful, but you are back to maintenance overhead.

By Bug Automation I'm referring to a disposable script that proves the system is broken. We can falsify the assumption that we have 'working' software. We can't prove the system is bug-free with our automation, but we can show its broken. 

The automation isn't there to indicate when we have fixed the issue - but to highlight that we, as a team, have created one.

In many situations a quick chat, screen-shot or URL is enough to help a developer fix a bug. But not always. For example, A tool like BlueBerry Test Assistant could help demonstrate a bug quicker than I can explain it. But in some contexts the best tool is code.

For example: I discovered a security flaw in an open source Content Management System used by several large media corporations, including my client at the time.  I could have described the issue to people, but that would have been a poor substitute to an actual demonstration.

Its hard to persuade someone that their 'secure' random token generator isn’t random - its easier to show them.  So I wrote some Bug Automation, and sent this along with a summary of the issue. (And together we figured out a more secure solution)

Another simple example: Google has a minor bug whereby if you enter Hebrew or Arabic text (with white-space) the full stop on the 'Press Enter...' message is placed at the wrong end of the sentence.


While the issue isn't hard to describe, or screen grab (see above). Recreating the issue might not be so easy. Therefore we can create some simple Bug Automation, like this.

Other members of your team can run this script and see the issue on their own PC. They don't have figure out how to type Right To Left languages or battle an OS or bug tracking system that doesn’t like you to copy and paste such things. Used purely as a communication aid, It also doesn’t have the maintenance overhead of trying to maintain a 'proof' of a fix long term.

Bug Automation is already a multimillion dollar industry, its called the Zero Day Exploit industry. Unfortunately that automation is often used for nefarious purposes. But as an example of positive deviance, it might be wise to pick-up on the clever things other developers & testers are doing, and use them for ourselves and for good.

Friday, 27 November 2015

VW behaving badly.

The EPA (The US government's Environmental Protection Agency) recently issued Notice of Violations regarding the emissions from Volkswagen cars. Volkswagen is actually a group of brands, therefore the Notice affects other cars such as Audi, Porsche and Skoda.

A lot of the focus has been on what was going on in Volkswagen, for example who knew what was being done? Did the VW testers know? Did they pass the details on etc.

What interests me is the wider issue of how this could have been possible for so long?  (Since 2009)  If so many cars were affected and for so long, why didn’t we hear about this sooner? Why isn’t there a team of people assigned to finding this stuff out... Oh wait, there is...

In the UK these emissions tests are governed by the Vehicle Certification Agency, answering to the Department of Transport.

One might expect the manufacturer to be less inclined to investigate the cars emissions, after-all testing costs money (less profit). I might also expect them to exploit the test rules and tolerances as best they could. This behaviour, while not ethical, is explainable given their motivations and incentives.

I'm even understanding of the mistaken belief that they can 'prove' their cars are compliant. This is highlighted in this quote from Vauxhall/Opel/GM when the BBC asked about possible irregularities in their vehicle NOx emissions:

"We have in-house testing that proves that the Zafira 1.6 meets all the legal emission limits."

A curious statement, Given that the systems concerned are software controlled, and as Dijkstra put it: "Testing shows the presence, not the absence of bugs".

An independent tax-funded regulatory body is in theory acting in our interests, the vehicle buyers and breathers of the emissions. So why did they not discover the issue? A closer look at the 'tests' themselves gives some clues. Here are a few points worth noting:

1) The test is carried out in a controlled temperature of 20-30 degrees centigrade. At first this might seem OK to non testers. But if you look-up the average temperatures, in the hottest month, of a few European locations:

 Bonn       August  18°C (64°F)
 London     July    19°C (66°F)
 Lisbon     July    24°C (74°F)
 Paris      July    20°C (68°F)
 Brussels   July    18°C (64°F)
 Rome       July    26°C (78°F)
 Vienna     July    19°C (66°F)
 Stockholm  July    18°C (64°F)


You begin to see that this rule is suspect. E.g.: In Paris, in the hottest month, approximately half the time will you meet this criteria in real life.

2) The relevant UK/EU test dates back to 1996. Some parts of the test date back 40 years.  Odd, given that the Engine Control Units, usually responsible for managing emissions behaviour, were introduced in the 1980s & 90s (<40years ago).

3) The procedure is highly predictable and repeatable - it always took 20mins 20secs to complete.

4) The rules require the 'driver' to stay within 2km/h (1.2mph) of an 'ideal' speed throughout the test.


In summary, old, highly scripted and rigidly enforced checks were performed in an unrealistic environment. The emissions-test isn't really testing at all. The procedure is a successful attempt to provide a repeatable scripted acceptance-test of a systems behaviour.

A systems behaviour was developed so when the car was driven in a defined manner, all the checks passed. The car can pass the test, but this provides no indication as to whether this is normal behaviour, or what might occur in any number of other realistic situations.

On a BBC Panorama programme, A former Automotive Type Approval Engineer talking about how cars have been only passing the emissions tests in the most unrealistic of conditions, is quoted as saying:
"...Testing the wrong things, in the wrong way, for quite a while"

This wasn’t testing, But it was done in the name of testing. Sound familiar?