Skip to main content

Are you sure you've "completed" testing? A Guardian Content API example.

Testing doesn't complete, it might end, it might finish, but it doesn't complete. There's too much to test. If you ever need confirmation of this, test something, something that's been tested already. Better still test a piece of software, you know has been tested by someone you think is a brilliant tester. A good tester like you, will still find new issues, ambiguities and bugs.

That's because the complexity of modern software is huge: as well as all the potential code paths of your code, there's all the other underlying code's paths and the near infinite domain of data it might process. Thats part of the beauty of testing, you have to be able to get a handle on this vast test space. That is, review a near infinite test-space in a [very] finite time-frame.

We are unable to give a complete picture of the product to our clients. But we are also free to find out new issues, that have so far eluded others. In fact the consequences are potentially more dramatic. We will always be sampling a sub-section of the potential code, data and inputs. The unexplored paths will always out number the mapped paths. As such the number of un-discovered issues is always going to be greater than the number already found. Or at least, we will not have time and/or resources to prove otherwise. As such, it's the tests you haven't run or even dreamed of, that are probably most significant.

As I learn, I become better equipped to see more issues in the software. My new knowledge allows me to better choose regions of the software's behaviour to examine. I can realise questions that previously I did not even think of. Each new question opens up a new part of that near-infinite set of tests, I've yet to complete.

For example, I learned that some Unicode characters can have multiple representations. The two representations are equivalent, but for example may utilise 2 codepoints to represent one character, rather than one codepoint representing one character. A good example would be the letter A, with a Grave accent:
À
À

Depending on your browser/OS they might look the same or different. Changing the font might help distinguish between them:
À
À 

My text editor actually renders them quite differently, even though they are meant to display the same:



Until I knew about this feature of unicode; I didn't know to ask the right questions. How would the software handle this? Could it correctly treat these as equal? This whole area of testing would not of been examined, if I hadn't taken the time to learn about this 'Canonical Equivalence' property of Unicode normalisation.

This is a situation when I would actively avoid using most test automation, until I was clear as to the my understanding of the potential issues. Therefore I stopped using my previous scripts, and used cURL. The benefit of cURL is that it gives me direct and visible control of what I request from a site/API. It will make the exact request I ask of it, with very little fuss and certainly no frills. I can be sure its not going to try and encode or interpret what I'm requesting, but rather, repeat it verbatim.

This example had an interesting result when used against the Guardian Content API. My first tests included this query to the Guardian's Content Search:

The non-combining query, including the letter À (Capital A with a grave accent or %C3%80 in a single character):

Query:
http://content.guardianapis.com/search?q=%C3%80lex&format=json

Response:
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":0,
    "startIndex":0,
    "pageSize":10,
    "currentPage":1,
    "pages":0,
    "orderBy":"newest",
    "didYouMean":"alex",
    "results":[]
  }
}

...and the query with the combining characters, consisting of the regular 'A' and a separate grave accent (%CC%80):

Query:
http://content.guardianapis.com/search?q=A%CC%80lex&format=json

Response:
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":0,
    "startIndex":0,
    "pageSize":10,
    "currentPage":1,
    "pages":0,
    "orderBy":"newest",
    "results":[]
  }
}

At first glance these two results look fairly similar, but a closer look shows the first response includes a didYouMean field. In theory these two queries should be treated equivalently. This minor difference suggests they were not being treated so, but this was also a fairly minor issue. As a tester I knew I had to examine this further, find out how big/bad could this difference be?

Rather than slip back into automation, I realised that what I needed was an example that demonstrates the potential magnitude of the issue. This was a human problem, or opportunity, I needed an example that would clearly diagnose an issue in one representation of the characters and not in the other. So I needed a query that could be affected by these differences and if interpreted correctly, deliver many news results. The answer was Société Générale a high profile and recent news story, with a non-ASCII, accented company name.

The non-combining query, using a single codepoint to represent the accented 'e':
Query:
http://content.guardianapis.com/search?q=Soci%c3%a9t%c3%a9+G%c3%a9n%c3%a9rale&format=json

Response (partial):
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":536,
    "startIndex":1,
    "pageSize":10,
    "currentPage":1,
    "pages":54,
    "orderBy":"newest",
    "results":[{
      "id":"business/2011/aug/14/economic-burden-debt-crisis-euro",
      "sectionId":"business",
      "sectionName":"Business",
      "webPublicationDate":"2011-08-14T00:06:13+01:00",
      "webTitle":"The financial burden of the debt crisis could lead countries to opt out of the euro",
      "webUrl":"http://www.guardian.co.uk/business/2011/aug/14/economic-burden-debt-crisis-euro",
      "apiUrl":"http://content.guardianapis.com/business/2011/aug/14/economic-burden-debt-crisis-euro"
    },{
...

As you can see there are over 500 results with this query.


The combining query, using a two codepoints to represent the accented 'e':

Query:
http://content.guardianapis.com/search?q=Socie%cc%81te%cc%81+Ge%cc%81ne%cc%81rale&format=json

Response:
{
  "response":{
    "status":"ok",
    "userTier":"free",
    "total":0,
    "startIndex":0,
    "pageSize":10,
    "currentPage":1,
    "pages":0,
    "orderBy":"newest",
    "didYouMean":"sofiété Générace",
    "results":[]
  }
}

This response shows that the query found 0 results, and suggested something else.

At this point it looked like there was an issue. But how could I be sure? maybe the Unicode NFC behaviour was purely hypothetical, not used in reality. So I needed an oracle, something that would help me decide if this behaviour was a bug. I switched to another news search system, one that generally seems reliable and would be respected in a comparison: Google News.

FireFox 5 (Mac OS X) Renders these characters differently, but google returns the same results. 
Note: Google Chrome renders no discernable difference.

I used cURL to make two queries to the Google News site, using the two different queries. This requires a minor tweak, to modify the user-agent of cURL, to stop it being blocked by Google. The results showed that google returned almost the same results, for both versions of "Société Générale". There were some minor differences, but these appeared to be inconsistent, possibly unrelated. The significant feedback from these google news pages was that Google returns many results for both forms of character representation, and those results are virtually identical. It would therefore appear that there is an issue with the Guardians handling of these codepoints.

Thanks to this investigation, we have learned of another possible limitation in the Guardian Search API. A limitation that could mean a user would not find news related to an important and current news event. This kind of investigation is at the heart of good testing, results learned from testing are quickly analysed, compared with background knowledge and used to generate more and better tests. Tools are selected for their ability to support this process, increasing the clarity of our results, without forcing us to write unneeded code, in awkward DSLs.

Comments

  1. Really intersting blog Pete - thanks for posting. Its a great example of how exploratory testing can add value over just adding further automation.

    It also clearly shows your iterative thinking in trying to identify what potential 'nasty' bugs may be hiding behind an innocuous symptom of failure

    ReplyDelete
  2. Interesting blog - especially because I am the product manager for the Guardian Open Platform. Our dev team and I would like to catch up with you - if nothing else, to say thank you properly for the nasties you caught :-)
    I can be reached at sharath.bulusu at guardian.co.uk. Looking forward to meeting.

    ReplyDelete

Post a Comment

Popular posts from this blog

Betting in Testing

“I’ve completed my testing of this feature, and I think it's ready to ship” “Are you willing to bet on that?” No, Don't worry, I’m not going to list various ways you could test the feature better or things you might have forgotten. Instead, I recommend you to ask yourself that question next time you believe you are finished.  Why? It might cause you to analyse your belief more critically. We arrive at a decision usually by means of a mixture of emotion, convention and reason. Considering the question of whether the feature and the app are good enough as a bet is likely to make you use a more evidence-based approach. Testing is gambling with your time to find information about the app. Why do I think I am done here? Would I bet money/reputation on it? I have a checklist stuck to one of my screens, that I read and contemplate when I get to this point. When you have considered the options, you may decide to check some more things or ship the app

Test Engineers, counsel for... all of the above!

Sometimes people discuss test engineers and QA as if they were a sort of police force, patrolling the streets of code looking for offences and offenders. While I can see the parallels, the investigation, checking the veracity of claims and a belief that we are making things safer. The simile soon falls down. But testers are not on the other side of the problem, we work alongside core developers, we often write code and follow all the same procedures (pull requests, planning, requirements analysis etc) they do. We also have the same goals, the delivery of working software that fulfills the team’s/company's goals and avoids harm. "A few good men" a great courtroom drama, all about finding the truth. Software quality, whatever that means for you and your company is helped by Test Engineers. Test Engineers approach the problem from another vantage point. We are the lawyers (& their investigators) in the court-room, sifting the evidence, questioning the facts and viewing t

XSS and Open Redirect on Telegraph.co.uk Authentication pages

I recently found a couple of security issues with the Telegraph.co.uk website. The site contained an Open redirect as well as an XSS vulnerability. These issues were in the authentication section of the website, https://auth.telegraph.co.uk/ . The flaws could provide an easy means to phish customer details and passwords from unsuspecting users. I informed the telegraph's technical management, as part of a responsible disclosure process. The telegraph management forwarded the issue report and thanked me the same day. (12th May 2014) The fix went live between the 11th and 14th of July, 2 months after the issue was reported. The details: The code served via auth.telegraph.co.uk appeared to have 2 vulnerabilities, an open redirect and a reflected Cross Site Scripting (XSS) vulnerability. Both types of vulnerabilty are in the OWASP Top 10 and can be used to manipulate and phish users of a website. As well has potentially hijack a user's session. Compromised URLs, that exp