Skip to main content

Are you sure you've "completed" testing? A Guardian Content API example.

Testing doesn't complete, it might end, it might finish, but it doesn't complete. There's too much to test. If you ever need confirmation of this, test something, something that's been tested already. Better still test a piece of software, you know has been tested by someone you think is a brilliant tester. A good tester like you, will still find new issues, ambiguities and bugs.

That's because the complexity of modern software is huge: as well as all the potential code paths of your code, there's all the other underlying code's paths and the near infinite domain of data it might process. Thats part of the beauty of testing, you have to be able to get a handle on this vast test space. That is, review a near infinite test-space in a [very] finite time-frame.

We are unable to give a complete picture of the product to our clients. But we are also free to find out new issues, that have so far eluded others. In fact the consequences are potentially more dramatic. We will always be sampling a sub-section of the potential code, data and inputs. The unexplored paths will always out number the mapped paths. As such the number of un-discovered issues is always going to be greater than the number already found. Or at least, we will not have time and/or resources to prove otherwise. As such, it's the tests you haven't run or even dreamed of, that are probably most significant.

As I learn, I become better equipped to see more issues in the software. My new knowledge allows me to better choose regions of the software's behaviour to examine. I can realise questions that previously I did not even think of. Each new question opens up a new part of that near-infinite set of tests, I've yet to complete.

For example, I learned that some Unicode characters can have multiple representations. The two representations are equivalent, but for example may utilise 2 codepoints to represent one character, rather than one codepoint representing one character. A good example would be the letter A, with a Grave accent:
À
À

Depending on your browser/OS they might look the same or different. Changing the font might help distinguish between them:
À
À 

My text editor actually renders them quite differently, even though they are meant to display the same:



Until I knew about this feature of unicode; I didn't know to ask the right questions. How would the software handle this? Could it correctly treat these as equal? This whole area of testing would not of been examined, if I hadn't taken the time to learn about this 'Canonical Equivalence' property of Unicode normalisation.

This is a situation when I would actively avoid using most test automation, until I was clear as to the my understanding of the potential issues. Therefore I stopped using my previous scripts, and used cURL. The benefit of cURL is that it gives me direct and visible control of what I request from a site/API. It will make the exact request I ask of it, with very little fuss and certainly no frills. I can be sure its not going to try and encode or interpret what I'm requesting, but rather, repeat it verbatim.

This example had an interesting result when used against the Guardian Content API. My first tests included this query to the Guardian's Content Search:

The non-combining query, including the letter À (Capital A with a grave accent or %C3%80 in a single character):

Query:
http://content.guardianapis.com/search?q=%C3%80lex&format=json

Response:
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":0,
    "startIndex":0,
    "pageSize":10,
    "currentPage":1,
    "pages":0,
    "orderBy":"newest",
    "didYouMean":"alex",
    "results":[]
  }
}

...and the query with the combining characters, consisting of the regular 'A' and a separate grave accent (%CC%80):

Query:
http://content.guardianapis.com/search?q=A%CC%80lex&format=json

Response:
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":0,
    "startIndex":0,
    "pageSize":10,
    "currentPage":1,
    "pages":0,
    "orderBy":"newest",
    "results":[]
  }
}

At first glance these two results look fairly similar, but a closer look shows the first response includes a didYouMean field. In theory these two queries should be treated equivalently. This minor difference suggests they were not being treated so, but this was also a fairly minor issue. As a tester I knew I had to examine this further, find out how big/bad could this difference be?

Rather than slip back into automation, I realised that what I needed was an example that demonstrates the potential magnitude of the issue. This was a human problem, or opportunity, I needed an example that would clearly diagnose an issue in one representation of the characters and not in the other. So I needed a query that could be affected by these differences and if interpreted correctly, deliver many news results. The answer was Société Générale a high profile and recent news story, with a non-ASCII, accented company name.

The non-combining query, using a single codepoint to represent the accented 'e':
Query:
http://content.guardianapis.com/search?q=Soci%c3%a9t%c3%a9+G%c3%a9n%c3%a9rale&format=json

Response (partial):
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":536,
    "startIndex":1,
    "pageSize":10,
    "currentPage":1,
    "pages":54,
    "orderBy":"newest",
    "results":[{
      "id":"business/2011/aug/14/economic-burden-debt-crisis-euro",
      "sectionId":"business",
      "sectionName":"Business",
      "webPublicationDate":"2011-08-14T00:06:13+01:00",
      "webTitle":"The financial burden of the debt crisis could lead countries to opt out of the euro",
      "webUrl":"http://www.guardian.co.uk/business/2011/aug/14/economic-burden-debt-crisis-euro",
      "apiUrl":"http://content.guardianapis.com/business/2011/aug/14/economic-burden-debt-crisis-euro"
    },{
...

As you can see there are over 500 results with this query.


The combining query, using a two codepoints to represent the accented 'e':

Query:
http://content.guardianapis.com/search?q=Socie%cc%81te%cc%81+Ge%cc%81ne%cc%81rale&format=json

Response:
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":0,
    "startIndex":0,
    "pageSize":10,
    "currentPage":1,
    "pages":0,
    "orderBy":"newest",
    "didYouMean":"sofiété Générace",
    "results":[]
  }
}

This response shows that the query found 0 results, and suggested something else.

At this point it looked like there was an issue. But how could I be sure? maybe the Unicode NFC behaviour was purely hypothetical, not used in reality. So I needed an oracle, something that would help me decide if this behaviour was a bug. I switched to another news search system, one that generally seems reliable and would be respected in a comparison: Google News.

FireFox 5 (Mac OS X) Renders these characters differently, but google returns the same results. 
Note: Google Chrome renders no discernable difference.

I used cURL to make two queries to the Google News site, using the two different queries. This requires a minor tweak, to modify the user-agent of cURL, to stop it being blocked by Google. The results showed that google returned almost the same results, for both versions of "Société Générale". There were some minor differences, but these appeared to be inconsistent, possibly unrelated. The significant feedback from these google news pages was that Google returns many results for both forms of character representation, and those results are virtually identical. It would therefore appear that there is an issue with the Guardians handling of these codepoints.

Thanks to this investigation, we have learned of another possible limitation in the Guardian Search API. A limitation that could mean a user would not find news related to an important and current news event. This kind of investigation is at the heart of good testing, results learned from testing are quickly analysed, compared with background knowledge and used to generate more and better tests. Tools are selected for their ability to support this process, increasing the clarity of our results, without forcing us to write unneeded code, in awkward DSLs.

Comments

  1. Really intersting blog Pete - thanks for posting. Its a great example of how exploratory testing can add value over just adding further automation.

    It also clearly shows your iterative thinking in trying to identify what potential 'nasty' bugs may be hiding behind an innocuous symptom of failure

    ReplyDelete
  2. Interesting blog - especially because I am the product manager for the Guardian Open Platform. Our dev team and I would like to catch up with you - if nothing else, to say thank you properly for the nasties you caught :-)
    I can be reached at sharath.bulusu at guardian.co.uk. Looking forward to meeting.

    ReplyDelete

Post a Comment

Popular posts from this blog

Can Gen-AI understand Payments?

When it comes to rolling out updates to large complex banking systems, things can get messy quickly. Of course, the holy grail is to have each subsystem work well independently and to do some form of Pact or contract testing – reducing the complex and painful integration work. But nonetheless – at some point you are going to need to see if the dog and the pony can do their show together – and its generally better to do that in a way that doesn’t make millions of pounds of transactions fail – in a highly public manner, in production.  (This post is based on my recent lightning talk at  PyData London ) For the last few years, I’ve worked in the world of high value, real time and cross border payments, And one of the sticking points in bank [software] integration is message generation. A lot of time is spent dreaming up and creating those messages, then maintaining what you have just built. The world of payments runs on messages, these days they are often XML messages – and they ...

What possible use could Gen AI be to me? (Part 1)

There’s a great scene in the Simpsons where the Monorail salesman comes to town and everyone (except Lisa of course) is quickly entranced by Monorail fever… He has an answer for every question and guess what? The Monorail will solve all the problems… somehow. The hype around Generative AI can seem a bit like that, and like Monorail-guy the sales-guy’s assure you Gen AI will solve all your problems - but can be pretty vague on the “how” part of the answer. So I’m going to provide a few short guides into how Generative (& other forms of AI) Artificial Intelligence can help you and your team. I’ll pitch the technical level differently for each one, and we’ll start with something fairly not technical: Custom Chatbots. ChatBots these days have evolved from the crude web sales tools of ten years ago, designed to hoover up leads for the sales team. They can now provide informative answers to questions based on documents or websites. If we take the most famous: Chat GPT 4. If we ignore the...

Manumation, the worst best practice.

There is a pattern I see with many clients, often enough that I sought out a word to describe it: Manumation, A sort of well-meaning automation that usually requires frequent, extensive and expensive intervention to keep it 'working'. You have probably seen it, the build server that needs a prod and a restart 'when things get a bit busy'. Or a deployment tool that, 'gets confused' and a 'test suite' that just needs another run or three. The cause can be any number of the usual suspects - a corporate standard tool warped 5 ways to make it fit what your team needs. A one-off script 'that manager' decided was an investment and needed to be re-used... A well-intended attempt to 'automate all the things' that achieved the opposite. They result in a manually intensive - automated process, where your team is like a character in the movie Metropolis, fighting with levers all day, just to keep the lights on upstairs. Manual-automation, manu...