Skip to main content

ID Skeptic

At a client site, a few years ago, I had an interesting discussion with a 'senior programmer'. Our discussion centred on a configurable home-page. A user could decide what news or other information etc, they wanted to display on their home page. They'd start off by being given a generic page - and the customer could add or remove certain types of content to customise their page. Once they saved the 'new look' site, their choices would be saved on the web-server.

The company didn't want to force people to login, or even make them sign-up for an account. The goal was to keep it simple for the user. But they needed a way to uniquely identify the users, so they used an existing feature of the website. The first time a user came to the site, they were cookied with a 'unique' id 'code'. We could then use this identifier as a key in our database to store the details of what the users had configured their homepage to look like.

The testers reading this will already be dreaming up but what if's and other questions regarding the plan. When I was told of the plan, it seemed logical and sensible. It has its flaws, and many of these were known and accepted. For example: The users settings would not easily be transferred from one computer to another etc. The cookies containing the ID code were likely to be deleted over time, for various reasons. They were known, but the benefits were considered greater than the known problems.

One thing did stand out when I heard the plan though. As a software tester, I've become more sensitive to absolutes; I suspect I notice certain words more than non-testers. When I hear Never, Always, Unique, Every and similar words it's like a red rag to a bull. (Maybe we should christen them matador-words, designed to engage the critical instincts in a good tester.)

In this instance, I've learned that unique is firstly a relative term and secondly rarely tested.

A relative term because many systems I've used and tested have had unique identifiers. Upon closer examination, they often end-up being unique to a development environment - but possibly repeated in test and live systems. Fine you think, until you see that developer’s test-orders might not be filtered out from real outgoing orders to suppliers. Or: unique to a country or language, until they attempt to merge databases to one large centralised and multinational system.

Sometimes it's more fundamental, they just aren't unique or maybe they are - but will soon 'run out'. A good example I've seen of an ID that wasn't unique is a timestamp. The idea being that a timestamp [to the millisecond etc] is unlikely to be assigned to multiple users. To be sure, a programmer might even synchronise the handout of the timestamps ensuring one is always before another. But what happens when you have 2 or even 100 servers? The chances of a clash soon become quite likely.

Another system I investigated had reliably unique identifiers or ‘IDs’, but the system was greedy and 'grabbed' many at start up. This combined with a flaky-system requiring many restarts per day - meant that the servers were 'burning' through vast amounts of the IDs and it was forecasted we would soon run out. A potentially quick fix - when we could see it coming, but it could of been a site-outing incident had it not been caught.

Another related issue I've seen is when the ID was unique, but the code used to look up matches for the ID was not correct, and didn't treat it as such. The comparison was essentially, if one value contained the other value - return true. This issue didn't show up at first, but did when one value was '6' and it was compared with e.g.: '156899' the 'match' was 'good'. Unfortunately that code was for restarting production servers and caused approx 80% of the production servers to be restarted at the same time.

In this example the IDs appeared at first glance to be good. They consisted of a mixture of letters and numbers and were over 60 characters long. This means there was a lot of 'ID-space' and therefore: very many possible unique IDs. The programmer correctly thought that this number was so huge that the system could hand out new IDs, essentially 'forever', without ever duplicating a single one. Any suggestion otherwise was clearly short-sighted, and failing to 'get the math'.

But actually, the programmer confused how the system 'should' work with how it 'does' work.  What we are testing is the uniqueness itself – that’s part of the system under test. We shouldn't assume that there might be bugs throughout the application - except in -that- critical but untested feature. When you question the behaviour of a piece of software, why stop when presented with a sacred cow in the form of the word 'Unique' or others such as 'Random'. It would be foolish to depend on the foundation of a unique ID when testing an application built on it, especially when given no evidence of it's 'goodness', other than faith.

What is often confusing to non-testers is that we question such things as 'Uniqueness'. As discussed above, the system could be capable of generating good 'unique' IDs, but it’s another thing to be confident that it is actually doing so. There are many reasons the system you’re working with isn't getting the unique input it requires. As a tester, questioning these assumptions and highlighting the risks we uncover provides valuable feedback to our customers.

Comments

  1. Good catch. Many times we assume that fields that are 'unique' or 'random' are actually so and fail to verify it. Infact we don't even test it assuming it is true. I was not even aware that I made this assumption till I read this post.. this article is an eye-opener. Writing automated scripts for generating random numbers in test environment is not the correct approach if we fail to validate what's really happening.

    Very interesting article.

    Thanks,
    Aruna
    http:\\technologyandleadership.com
    "The intersection of Technology and Leadership"

    ReplyDelete
  2. Hi Pete!
    Regarding the in house algorithm for generating unique ID.
    Once upon a time, one 'developer' said:
    "I do not have to take care of this case! What is possibility that two users will try to do that at the same time!"
    I do not need to emphasize that this is our favorite byword when we are discussing parallel processing algorithm issues.

    ReplyDelete
  3. I ones worked on a system that generated a 4 digit ID. Yes, the field was limited to 4 digits. The designer said: "This shouldn't be a problem, because it's only for updates of the current contract. There is a relationship ID/contract shouldn't become a problem. No-one updates a contract 9999 times".

    And in the beginning, it seemed plausible. The ID was increased by 1 (based on the previous ID) for situation X and by 2 (based on the previous ID) for situation Y (yes, still a bit of waste, but it was acceptable for them). So no problem... yet.

    Until someone started creating new records with contract/ID increasing by 10 for situation A and by 11 for situation B. Now the rate gets up even faster. But still, even after i raised this as a possible problen they dismissed it as "shouldn't become a problem".

    It might not become a problem... until i discovered how the previous ID was determined. It seemed the ID wasn't based on the previous ID/contract combination, but just on the previous ID..... the highest already used ID in the whole dataset! And a quick check in the production data reveiled that this ID already reached approx. 6000! Now imagine the time before reaching 9999.

    Finally, they changed the code. But only for the last part.

    ReplyDelete

Post a Comment

Popular posts from this blog

What possible use could Gen AI be to me? (Part 1)

There’s a great scene in the Simpsons where the Monorail salesman comes to town and everyone (except Lisa of course) is quickly entranced by Monorail fever… He has an answer for every question and guess what? The Monorail will solve all the problems… somehow. The hype around Generative AI can seem a bit like that, and like Monorail-guy the sales-guy’s assure you Gen AI will solve all your problems - but can be pretty vague on the “how” part of the answer. So I’m going to provide a few short guides into how Generative (& other forms of AI) Artificial Intelligence can help you and your team. I’ll pitch the technical level differently for each one, and we’ll start with something fairly not technical: Custom Chatbots. ChatBots these days have evolved from the crude web sales tools of ten years ago, designed to hoover up leads for the sales team. They can now provide informative answers to questions based on documents or websites. If we take the most famous: Chat GPT 4. If we ignore the

Can Gen-AI understand Payments?

When it comes to rolling out updates to large complex banking systems, things can get messy quickly. Of course, the holy grail is to have each subsystem work well independently and to do some form of Pact or contract testing – reducing the complex and painful integration work. But nonetheless – at some point you are going to need to see if the dog and the pony can do their show together – and its generally better to do that in a way that doesn’t make millions of pounds of transactions fail – in a highly public manner, in production.  (This post is based on my recent lightning talk at  PyData London ) For the last few years, I’ve worked in the world of high value, real time and cross border payments, And one of the sticking points in bank [software] integration is message generation. A lot of time is spent dreaming up and creating those messages, then maintaining what you have just built. The world of payments runs on messages, these days they are often XML messages – and they can be pa

Is your ChatBot actually using your data?

 In 316 AD Emperor Constantine issued a new coin,  there's nothing too unique about that in itself. But this coin is significant due to its pagan/roman religious symbols. Why is this odd? Constantine had converted himself, and probably with little consultation -  his empire to Christianity, years before. Yet the coin shows the emperor and the (pagan) sun god Sol.  Looks Legit! While this seems out of place, to us (1700 years later), it's not entirely surprising. Constantine and his people had followed different, older gods for centuries. The people would have been raised and taught the old pagan stories, and when presented with a new narrative it's not surprising they borrowed from and felt comfortable with both. I've seen much the same behaviour with Large Language Models (LLMs) like ChatGPT. You can provide them with fresh new data, from your own documents, but what's to stop it from listening to its old training instead?  You could spend a lot of time collating,