Skip to main content

Are you sure you've "completed" testing? A Guardian Content API example.

Testing doesn't complete, it might end, it might finish, but it doesn't complete. There's too much to test. If you ever need confirmation of this, test something, something that's been tested already. Better still test a piece of software, you know has been tested by someone you think is a brilliant tester. A good tester like you, will still find new issues, ambiguities and bugs.

That's because the complexity of modern software is huge: as well as all the potential code paths of your code, there's all the other underlying code's paths and the near infinite domain of data it might process. Thats part of the beauty of testing, you have to be able to get a handle on this vast test space. That is, review a near infinite test-space in a [very] finite time-frame.

We are unable to give a complete picture of the product to our clients. But we are also free to find out new issues, that have so far eluded others. In fact the consequences are potentially more dramatic. We will always be sampling a sub-section of the potential code, data and inputs. The unexplored paths will always out number the mapped paths. As such the number of un-discovered issues is always going to be greater than the number already found. Or at least, we will not have time and/or resources to prove otherwise. As such, it's the tests you haven't run or even dreamed of, that are probably most significant.

As I learn, I become better equipped to see more issues in the software. My new knowledge allows me to better choose regions of the software's behaviour to examine. I can realise questions that previously I did not even think of. Each new question opens up a new part of that near-infinite set of tests, I've yet to complete.

For example, I learned that some Unicode characters can have multiple representations. The two representations are equivalent, but for example may utilise 2 codepoints to represent one character, rather than one codepoint representing one character. A good example would be the letter A, with a Grave accent:
À
À

Depending on your browser/OS they might look the same or different. Changing the font might help distinguish between them:
À
À 

My text editor actually renders them quite differently, even though they are meant to display the same:



Until I knew about this feature of unicode; I didn't know to ask the right questions. How would the software handle this? Could it correctly treat these as equal? This whole area of testing would not of been examined, if I hadn't taken the time to learn about this 'Canonical Equivalence' property of Unicode normalisation.

This is a situation when I would actively avoid using most test automation, until I was clear as to the my understanding of the potential issues. Therefore I stopped using my previous scripts, and used cURL. The benefit of cURL is that it gives me direct and visible control of what I request from a site/API. It will make the exact request I ask of it, with very little fuss and certainly no frills. I can be sure its not going to try and encode or interpret what I'm requesting, but rather, repeat it verbatim.

This example had an interesting result when used against the Guardian Content API. My first tests included this query to the Guardian's Content Search:

The non-combining query, including the letter À (Capital A with a grave accent or %C3%80 in a single character):

Query:
http://content.guardianapis.com/search?q=%C3%80lex&format=json

Response:
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":0,
    "startIndex":0,
    "pageSize":10,
    "currentPage":1,
    "pages":0,
    "orderBy":"newest",
    "didYouMean":"alex",
    "results":[]
  }
}

...and the query with the combining characters, consisting of the regular 'A' and a separate grave accent (%CC%80):

Query:
http://content.guardianapis.com/search?q=A%CC%80lex&format=json

Response:
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":0,
    "startIndex":0,
    "pageSize":10,
    "currentPage":1,
    "pages":0,
    "orderBy":"newest",
    "results":[]
  }
}

At first glance these two results look fairly similar, but a closer look shows the first response includes a didYouMean field. In theory these two queries should be treated equivalently. This minor difference suggests they were not being treated so, but this was also a fairly minor issue. As a tester I knew I had to examine this further, find out how big/bad could this difference be?

Rather than slip back into automation, I realised that what I needed was an example that demonstrates the potential magnitude of the issue. This was a human problem, or opportunity, I needed an example that would clearly diagnose an issue in one representation of the characters and not in the other. So I needed a query that could be affected by these differences and if interpreted correctly, deliver many news results. The answer was Société Générale a high profile and recent news story, with a non-ASCII, accented company name.

The non-combining query, using a single codepoint to represent the accented 'e':
Query:
http://content.guardianapis.com/search?q=Soci%c3%a9t%c3%a9+G%c3%a9n%c3%a9rale&format=json

Response (partial):
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":536,
    "startIndex":1,
    "pageSize":10,
    "currentPage":1,
    "pages":54,
    "orderBy":"newest",
    "results":[{
      "id":"business/2011/aug/14/economic-burden-debt-crisis-euro",
      "sectionId":"business",
      "sectionName":"Business",
      "webPublicationDate":"2011-08-14T00:06:13+01:00",
      "webTitle":"The financial burden of the debt crisis could lead countries to opt out of the euro",
      "webUrl":"http://www.guardian.co.uk/business/2011/aug/14/economic-burden-debt-crisis-euro",
      "apiUrl":"http://content.guardianapis.com/business/2011/aug/14/economic-burden-debt-crisis-euro"
    },{
...

As you can see there are over 500 results with this query.


The combining query, using a two codepoints to represent the accented 'e':

Query:
http://content.guardianapis.com/search?q=Socie%cc%81te%cc%81+Ge%cc%81ne%cc%81rale&format=json

Response:
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":0,
    "startIndex":0,
    "pageSize":10,
    "currentPage":1,
    "pages":0,
    "orderBy":"newest",
    "didYouMean":"sofiété Générace",
    "results":[]
  }
}

This response shows that the query found 0 results, and suggested something else.

At this point it looked like there was an issue. But how could I be sure? maybe the Unicode NFC behaviour was purely hypothetical, not used in reality. So I needed an oracle, something that would help me decide if this behaviour was a bug. I switched to another news search system, one that generally seems reliable and would be respected in a comparison: Google News.

FireFox 5 (Mac OS X) Renders these characters differently, but google returns the same results. 
Note: Google Chrome renders no discernable difference.

I used cURL to make two queries to the Google News site, using the two different queries. This requires a minor tweak, to modify the user-agent of cURL, to stop it being blocked by Google. The results showed that google returned almost the same results, for both versions of "Société Générale". There were some minor differences, but these appeared to be inconsistent, possibly unrelated. The significant feedback from these google news pages was that Google returns many results for both forms of character representation, and those results are virtually identical. It would therefore appear that there is an issue with the Guardians handling of these codepoints.

Thanks to this investigation, we have learned of another possible limitation in the Guardian Search API. A limitation that could mean a user would not find news related to an important and current news event. This kind of investigation is at the heart of good testing, results learned from testing are quickly analysed, compared with background knowledge and used to generate more and better tests. Tools are selected for their ability to support this process, increasing the clarity of our results, without forcing us to write unneeded code, in awkward DSLs.

Comments

  1. Really intersting blog Pete - thanks for posting. Its a great example of how exploratory testing can add value over just adding further automation.

    It also clearly shows your iterative thinking in trying to identify what potential 'nasty' bugs may be hiding behind an innocuous symptom of failure

    ReplyDelete
  2. Interesting blog - especially because I am the product manager for the Guardian Open Platform. Our dev team and I would like to catch up with you - if nothing else, to say thank you properly for the nasties you caught :-)
    I can be reached at sharath.bulusu at guardian.co.uk. Looking forward to meeting.

    ReplyDelete

Post a Comment

Popular posts from this blog

Why you might need testers

I remember teaching my son to ride his bike. No, Strike that, Helping him to learn to ride his bike. It’s that way round – if we are honest – he was changing his brain so it could adapt to the mechanism and behaviour of the bike. I was just holding the bike, pushing and showering him with praise and tips.
If he fell, I didn’t and couldn’t change the way he was riding the bike. I suggested things, rubbed his sore knee and pointed out that he had just cycled more in that last attempt – than he had ever managed before - Son this is working, you’re getting it.
I had help of course, Gravity being one. When he lost balance, it hurt. Not a lot, but enough for his brain to get the feedback it needed to rewire a few neurons. If the mistakes were subtler, advice might help – try going faster – that will make the bike less wobbly. The excitement of going faster and better helped rewire a few more neurons.
When we have this sort of immediate feedback we learn quicker, we improve our game. When the f…

Thank you for finding the bug I missed.

Thank you to the colleague/customer/product owner, who found the bug I missed. That oversight, was (at least in part) my mistake. I've been thinking about what happened and what that means to me and my team.

I'm happy you told me about the issue you found, because you...

1) Opened my eyes to a situation I'd never have thought to investigate.

2) Gave me another item for my checklist of things to check in future.

3) Made me remember, that we are never done testing.

4) Are never sure if the application 'works' well enough.

5) Reminded me to explore more and build less.

6) To request that we may wish to assign more time to finding these issues.

7) Let me experience the hindsight bias, so that the edge-case now seems obvious!

Being a square keeps you from going around in circles.

After a weary few hours sorting through, re-running and manually double checking the "automated test" results, the team decide they need to "run the tests again!", that's a problem to the team. Why? because they are too slow. The 'test' runs take too long and they won't have the results until tomorrow.
How does our team intend to fix the problem? ... make the tests run faster. Maybe use a new framework, get better hardware or some other cool trick. The team get busy, update the test tools and soon find them selves in a similar position. Now of course they need to rewrite them in language X or using a new [A-Z]+DD methodology. I can't believe you are still using technology Z , Luddites!
Updating your tooling, and using a methodology appropriate to your context makes sense and should be factored into your workflow and estimates. But the above approach to solving the problem, starts with the wrong problem. As such, its not likely to find the right ans…