Skip to main content

ID Skeptic

At a client site, a few years ago, I had an interesting discussion with a 'senior programmer'. Our discussion centred on a configurable home-page. A user could decide what news or other information etc, they wanted to display on their home page. They'd start off by being given a generic page - and the customer could add or remove certain types of content to customise their page. Once they saved the 'new look' site, their choices would be saved on the web-server.

The company didn't want to force people to login, or even make them sign-up for an account. The goal was to keep it simple for the user. But they needed a way to uniquely identify the users, so they used an existing feature of the website. The first time a user came to the site, they were cookied with a 'unique' id 'code'. We could then use this identifier as a key in our database to store the details of what the users had configured their homepage to look like.

The testers reading this will already be dreaming up but what if's and other questions regarding the plan. When I was told of the plan, it seemed logical and sensible. It has its flaws, and many of these were known and accepted. For example: The users settings would not easily be transferred from one computer to another etc. The cookies containing the ID code were likely to be deleted over time, for various reasons. They were known, but the benefits were considered greater than the known problems.

One thing did stand out when I heard the plan though. As a software tester, I've become more sensitive to absolutes; I suspect I notice certain words more than non-testers. When I hear Never, Always, Unique, Every and similar words it's like a red rag to a bull. (Maybe we should christen them matador-words, designed to engage the critical instincts in a good tester.)

In this instance, I've learned that unique is firstly a relative term and secondly rarely tested.

A relative term because many systems I've used and tested have had unique identifiers. Upon closer examination, they often end-up being unique to a development environment - but possibly repeated in test and live systems. Fine you think, until you see that developer’s test-orders might not be filtered out from real outgoing orders to suppliers. Or: unique to a country or language, until they attempt to merge databases to one large centralised and multinational system.

Sometimes it's more fundamental, they just aren't unique or maybe they are - but will soon 'run out'. A good example I've seen of an ID that wasn't unique is a timestamp. The idea being that a timestamp [to the millisecond etc] is unlikely to be assigned to multiple users. To be sure, a programmer might even synchronise the handout of the timestamps ensuring one is always before another. But what happens when you have 2 or even 100 servers? The chances of a clash soon become quite likely.

Another system I investigated had reliably unique identifiers or ‘IDs’, but the system was greedy and 'grabbed' many at start up. This combined with a flaky-system requiring many restarts per day - meant that the servers were 'burning' through vast amounts of the IDs and it was forecasted we would soon run out. A potentially quick fix - when we could see it coming, but it could of been a site-outing incident had it not been caught.

Another related issue I've seen is when the ID was unique, but the code used to look up matches for the ID was not correct, and didn't treat it as such. The comparison was essentially, if one value contained the other value - return true. This issue didn't show up at first, but did when one value was '6' and it was compared with e.g.: '156899' the 'match' was 'good'. Unfortunately that code was for restarting production servers and caused approx 80% of the production servers to be restarted at the same time.

In this example the IDs appeared at first glance to be good. They consisted of a mixture of letters and numbers and were over 60 characters long. This means there was a lot of 'ID-space' and therefore: very many possible unique IDs. The programmer correctly thought that this number was so huge that the system could hand out new IDs, essentially 'forever', without ever duplicating a single one. Any suggestion otherwise was clearly short-sighted, and failing to 'get the math'.

But actually, the programmer confused how the system 'should' work with how it 'does' work.  What we are testing is the uniqueness itself – that’s part of the system under test. We shouldn't assume that there might be bugs throughout the application - except in -that- critical but untested feature. When you question the behaviour of a piece of software, why stop when presented with a sacred cow in the form of the word 'Unique' or others such as 'Random'. It would be foolish to depend on the foundation of a unique ID when testing an application built on it, especially when given no evidence of it's 'goodness', other than faith.

What is often confusing to non-testers is that we question such things as 'Uniqueness'. As discussed above, the system could be capable of generating good 'unique' IDs, but it’s another thing to be confident that it is actually doing so. There are many reasons the system you’re working with isn't getting the unique input it requires. As a tester, questioning these assumptions and highlighting the risks we uncover provides valuable feedback to our customers.


  1. Good catch. Many times we assume that fields that are 'unique' or 'random' are actually so and fail to verify it. Infact we don't even test it assuming it is true. I was not even aware that I made this assumption till I read this post.. this article is an eye-opener. Writing automated scripts for generating random numbers in test environment is not the correct approach if we fail to validate what's really happening.

    Very interesting article.

    "The intersection of Technology and Leadership"

  2. Hi Pete!
    Regarding the in house algorithm for generating unique ID.
    Once upon a time, one 'developer' said:
    "I do not have to take care of this case! What is possibility that two users will try to do that at the same time!"
    I do not need to emphasize that this is our favorite byword when we are discussing parallel processing algorithm issues.

  3. I ones worked on a system that generated a 4 digit ID. Yes, the field was limited to 4 digits. The designer said: "This shouldn't be a problem, because it's only for updates of the current contract. There is a relationship ID/contract shouldn't become a problem. No-one updates a contract 9999 times".

    And in the beginning, it seemed plausible. The ID was increased by 1 (based on the previous ID) for situation X and by 2 (based on the previous ID) for situation Y (yes, still a bit of waste, but it was acceptable for them). So no problem... yet.

    Until someone started creating new records with contract/ID increasing by 10 for situation A and by 11 for situation B. Now the rate gets up even faster. But still, even after i raised this as a possible problen they dismissed it as "shouldn't become a problem".

    It might not become a problem... until i discovered how the previous ID was determined. It seemed the ID wasn't based on the previous ID/contract combination, but just on the previous ID..... the highest already used ID in the whole dataset! And a quick check in the production data reveiled that this ID already reached approx. 6000! Now imagine the time before reaching 9999.

    Finally, they changed the code. But only for the last part.


Post a Comment

Popular posts from this blog

Why you might need testers

I remember teaching my son to ride his bike. No, Strike that, Helping him to learn to ride his bike. It’s that way round – if we are honest – he was changing his brain so it could adapt to the mechanism and behaviour of the bike. I was just holding the bike, pushing and showering him with praise and tips.
If he fell, I didn’t and couldn’t change the way he was riding the bike. I suggested things, rubbed his sore knee and pointed out that he had just cycled more in that last attempt – than he had ever managed before - Son this is working, you’re getting it.
I had help of course, Gravity being one. When he lost balance, it hurt. Not a lot, but enough for his brain to get the feedback it needed to rewire a few neurons. If the mistakes were subtler, advice might help – try going faster – that will make the bike less wobbly. The excitement of going faster and better helped rewire a few more neurons.
When we have this sort of immediate feedback we learn quicker, we improve our game. When the f…

Thank you for finding the bug I missed.

Thank you to the colleague/customer/product owner, who found the bug I missed. That oversight, was (at least in part) my mistake. I've been thinking about what happened and what that means to me and my team.

I'm happy you told me about the issue you found, because you...

1) Opened my eyes to a situation I'd never have thought to investigate.

2) Gave me another item for my checklist of things to check in future.

3) Made me remember, that we are never done testing.

4) Are never sure if the application 'works' well enough.

5) Reminded me to explore more and build less.

6) To request that we may wish to assign more time to finding these issues.

7) Let me experience the hindsight bias, so that the edge-case now seems obvious!

Being a square keeps you from going around in circles.

After a weary few hours sorting through, re-running and manually double checking the "automated test" results, the team decide they need to "run the tests again!", that's a problem to the team. Why? because they are too slow. The 'test' runs take too long and they won't have the results until tomorrow.
How does our team intend to fix the problem? ... make the tests run faster. Maybe use a new framework, get better hardware or some other cool trick. The team get busy, update the test tools and soon find them selves in a similar position. Now of course they need to rewrite them in language X or using a new [A-Z]+DD methodology. I can't believe you are still using technology Z , Luddites!
Updating your tooling, and using a methodology appropriate to your context makes sense and should be factored into your workflow and estimates. But the above approach to solving the problem, starts with the wrong problem. As such, its not likely to find the right ans…