Skip to main content

ID Skeptic

At a client site, a few years ago, I had an interesting discussion with a 'senior programmer'. Our discussion centred on a configurable home-page. A user could decide what news or other information etc, they wanted to display on their home page. They'd start off by being given a generic page - and the customer could add or remove certain types of content to customise their page. Once they saved the 'new look' site, their choices would be saved on the web-server.

The company didn't want to force people to login, or even make them sign-up for an account. The goal was to keep it simple for the user. But they needed a way to uniquely identify the users, so they used an existing feature of the website. The first time a user came to the site, they were cookied with a 'unique' id 'code'. We could then use this identifier as a key in our database to store the details of what the users had configured their homepage to look like.

The testers reading this will already be dreaming up but what if's and other questions regarding the plan. When I was told of the plan, it seemed logical and sensible. It has its flaws, and many of these were known and accepted. For example: The users settings would not easily be transferred from one computer to another etc. The cookies containing the ID code were likely to be deleted over time, for various reasons. They were known, but the benefits were considered greater than the known problems.

One thing did stand out when I heard the plan though. As a software tester, I've become more sensitive to absolutes; I suspect I notice certain words more than non-testers. When I hear Never, Always, Unique, Every and similar words it's like a red rag to a bull. (Maybe we should christen them matador-words, designed to engage the critical instincts in a good tester.)

In this instance, I've learned that unique is firstly a relative term and secondly rarely tested.

A relative term because many systems I've used and tested have had unique identifiers. Upon closer examination, they often end-up being unique to a development environment - but possibly repeated in test and live systems. Fine you think, until you see that developer’s test-orders might not be filtered out from real outgoing orders to suppliers. Or: unique to a country or language, until they attempt to merge databases to one large centralised and multinational system.

Sometimes it's more fundamental, they just aren't unique or maybe they are - but will soon 'run out'. A good example I've seen of an ID that wasn't unique is a timestamp. The idea being that a timestamp [to the millisecond etc] is unlikely to be assigned to multiple users. To be sure, a programmer might even synchronise the handout of the timestamps ensuring one is always before another. But what happens when you have 2 or even 100 servers? The chances of a clash soon become quite likely.

Another system I investigated had reliably unique identifiers or ‘IDs’, but the system was greedy and 'grabbed' many at start up. This combined with a flaky-system requiring many restarts per day - meant that the servers were 'burning' through vast amounts of the IDs and it was forecasted we would soon run out. A potentially quick fix - when we could see it coming, but it could of been a site-outing incident had it not been caught.

Another related issue I've seen is when the ID was unique, but the code used to look up matches for the ID was not correct, and didn't treat it as such. The comparison was essentially, if one value contained the other value - return true. This issue didn't show up at first, but did when one value was '6' and it was compared with e.g.: '156899' the 'match' was 'good'. Unfortunately that code was for restarting production servers and caused approx 80% of the production servers to be restarted at the same time.

In this example the IDs appeared at first glance to be good. They consisted of a mixture of letters and numbers and were over 60 characters long. This means there was a lot of 'ID-space' and therefore: very many possible unique IDs. The programmer correctly thought that this number was so huge that the system could hand out new IDs, essentially 'forever', without ever duplicating a single one. Any suggestion otherwise was clearly short-sighted, and failing to 'get the math'.

But actually, the programmer confused how the system 'should' work with how it 'does' work.  What we are testing is the uniqueness itself – that’s part of the system under test. We shouldn't assume that there might be bugs throughout the application - except in -that- critical but untested feature. When you question the behaviour of a piece of software, why stop when presented with a sacred cow in the form of the word 'Unique' or others such as 'Random'. It would be foolish to depend on the foundation of a unique ID when testing an application built on it, especially when given no evidence of it's 'goodness', other than faith.

What is often confusing to non-testers is that we question such things as 'Uniqueness'. As discussed above, the system could be capable of generating good 'unique' IDs, but it’s another thing to be confident that it is actually doing so. There are many reasons the system you’re working with isn't getting the unique input it requires. As a tester, questioning these assumptions and highlighting the risks we uncover provides valuable feedback to our customers.


  1. Good catch. Many times we assume that fields that are 'unique' or 'random' are actually so and fail to verify it. Infact we don't even test it assuming it is true. I was not even aware that I made this assumption till I read this post.. this article is an eye-opener. Writing automated scripts for generating random numbers in test environment is not the correct approach if we fail to validate what's really happening.

    Very interesting article.

    "The intersection of Technology and Leadership"

  2. Hi Pete!
    Regarding the in house algorithm for generating unique ID.
    Once upon a time, one 'developer' said:
    "I do not have to take care of this case! What is possibility that two users will try to do that at the same time!"
    I do not need to emphasize that this is our favorite byword when we are discussing parallel processing algorithm issues.

  3. I ones worked on a system that generated a 4 digit ID. Yes, the field was limited to 4 digits. The designer said: "This shouldn't be a problem, because it's only for updates of the current contract. There is a relationship ID/contract shouldn't become a problem. No-one updates a contract 9999 times".

    And in the beginning, it seemed plausible. The ID was increased by 1 (based on the previous ID) for situation X and by 2 (based on the previous ID) for situation Y (yes, still a bit of waste, but it was acceptable for them). So no problem... yet.

    Until someone started creating new records with contract/ID increasing by 10 for situation A and by 11 for situation B. Now the rate gets up even faster. But still, even after i raised this as a possible problen they dismissed it as "shouldn't become a problem".

    It might not become a problem... until i discovered how the previous ID was determined. It seemed the ID wasn't based on the previous ID/contract combination, but just on the previous ID..... the highest already used ID in the whole dataset! And a quick check in the production data reveiled that this ID already reached approx. 6000! Now imagine the time before reaching 9999.

    Finally, they changed the code. But only for the last part.


Post a Comment

Popular posts from this blog

Betting in Testing

“I’ve completed my testing of this feature, and I think it's ready to ship” “Are you willing to bet on that?” No, Don't worry, I’m not going to list various ways you could test the feature better or things you might have forgotten. Instead, I recommend you to ask yourself that question next time you believe you are finished.  Why? It might cause you to analyse your belief more critically. We arrive at a decision usually by means of a mixture of emotion, convention and reason. Considering the question of whether the feature and the app are good enough as a bet is likely to make you use a more evidence-based approach. Testing is gambling with your time to find information about the app. Why do I think I am done here? Would I bet money/reputation on it? I have a checklist stuck to one of my screens, that I read and contemplate when I get to this point. When you have considered the options, you may decide to check some more things or ship the app

Test Engineers, counsel for... all of the above!

Sometimes people discuss test engineers and QA as if they were a sort of police force, patrolling the streets of code looking for offences and offenders. While I can see the parallels, the investigation, checking the veracity of claims and a belief that we are making things safer. The simile soon falls down. But testers are not on the other side of the problem, we work alongside core developers, we often write code and follow all the same procedures (pull requests, planning, requirements analysis etc) they do. We also have the same goals, the delivery of working software that fulfills the team’s/company's goals and avoids harm. "A few good men" a great courtroom drama, all about finding the truth. Software quality, whatever that means for you and your company is helped by Test Engineers. Test Engineers approach the problem from another vantage point. We are the lawyers (& their investigators) in the court-room, sifting the evidence, questioning the facts and viewing t

XSS and Open Redirect on Authentication pages

I recently found a couple of security issues with the website. The site contained an Open redirect as well as an XSS vulnerability. These issues were in the authentication section of the website, . The flaws could provide an easy means to phish customer details and passwords from unsuspecting users. I informed the telegraph's technical management, as part of a responsible disclosure process. The telegraph management forwarded the issue report and thanked me the same day. (12th May 2014) The fix went live between the 11th and 14th of July, 2 months after the issue was reported. The details: The code served via appeared to have 2 vulnerabilities, an open redirect and a reflected Cross Site Scripting (XSS) vulnerability. Both types of vulnerabilty are in the OWASP Top 10 and can be used to manipulate and phish users of a website. As well has potentially hijack a user's session. Compromised URLs, that exp