Skip to main content

Posts

Showing posts with the label critical thinking

Is your ChatBot actually using your data?

 In 316 AD Emperor Constantine issued a new coin,  there's nothing too unique about that in itself. But this coin is significant due to its pagan/roman religious symbols. Why is this odd? Constantine had converted himself, and probably with little consultation -  his empire to Christianity, years before. Yet the coin shows the emperor and the (pagan) sun god Sol.  Looks Legit! While this seems out of place, to us (1700 years later), it's not entirely surprising. Constantine and his people had followed different, older gods for centuries. The people would have been raised and taught the old pagan stories, and when presented with a new narrative it's not surprising they borrowed from and felt comfortable with both. I've seen much the same behaviour with Large Language Models (LLMs) like ChatGPT. You can provide them with fresh new data, from your own documents, but what's to stop it from listening to its old training instead?  You could spend a lot of time collating,

Test Engineers, counsel for... all of the above!

Sometimes people discuss test engineers and QA as if they were a sort of police force, patrolling the streets of code looking for offences and offenders. While I can see the parallels, the investigation, checking the veracity of claims and a belief that we are making things safer. The simile soon falls down. But testers are not on the other side of the problem, we work alongside core developers, we often write code and follow all the same procedures (pull requests, planning, requirements analysis etc) they do. We also have the same goals, the delivery of working software that fulfills the team’s/company's goals and avoids harm. "A few good men" a great courtroom drama, all about finding the truth. Software quality, whatever that means for you and your company is helped by Test Engineers. Test Engineers approach the problem from another vantage point. We are the lawyers (& their investigators) in the court-room, sifting the evidence, questioning the facts and viewing t

Avoiding Wild Goose Chases While Debugging.

When I’m debugging a complex system, I’m constantly looking for patterns. I just ran this test code... What did I see in the log? I just processed a metric $^&*-load of data, did our memory footprint blip? I’m probably using every freedom-unit of screen space to tail logs, run a memory usage tool, run an IDE & debugger, watch a trace of API calls, run test code… And I’m doing this over and over. Then I see it. Bingo, that spike in API calls hits only when that process over there jumps to 20% processor usage when the app also throws that error. Unfortunately, I may have been mistaken. On a sufficiently complex system, the emergent behaviour can approach the appearance of randomness. Combinatorial explosions are for real, and they are happening constantly in your shiny new MacBook. My bug isn’t what I think it is. I’m examining so many variables in a system with dozens of subsystems at play, it's inevitable I will see a correlation. We know this more formally as

Avoiding Death By Exposure

There's no such thing as a small bug. Customers, be they people or businesses, do not measure Software bugs in metres, feet or miles or kilograms. They use measures like time wasted, life-lost and money.  Take a recent bug from Facebook . It affected thousands, maybe millions of customers and the bottom line of companies (seemingly) unconnected with Facebook such as Spotify, Tik-Tok and SoundCloud, and probably countless smaller companies. So why did the journalist seem to think it was small? Too often we judge the systems we create by how likely they are to fail, given our narrow view of the world. A better measure is our exposure when the systems fail . The exposure for Facebook is a greater motivation for other companies to disentangle themselves from Facebook's SDK, or promote a rival platform. It doesn't matter if our bug is one tiny assumption or one character out of place, if it stops a million people from using or buying an app then it's a huge bug. 

A h̶i̶t̶c̶h̶h̶i̶k̶e̶r̶'s̶ software tester's guide to randomised testing - Part 2

How would test a water sac? (Wow there, calm that tester brain... I know what you are thinking, Whats it used for? Who / what uses it? how long does it need to last? Does the temperature of the water matter?  Is it single use? etc. But let's assume a generic hiking or camping water sac for now) I'm guessing one of your suggestions includes filling it with water, shaking it a bit and checking for leaks. Seems kind of obvious right? but when it comes to software, we often do away with old-fashioned techniques such as filling something up and looking at it. Where's the machine learning test algorithm? Call this a BDD scenario? Can Selenium check for H₂0? I have to run this past the B.A... This is your software. We can treat randomly generated test data and inputs in much the same way as water . Data files or other inputs like user interactions are the ever-moving parts of our applications. Think about it, the code is entirely static - it's the state or data that i

Was there a test for that? No, and there shouldn't be.

The release shipped. For a while, the team felt good. The work was done, the team had achieved something, and that was rewarding.  Unfortunately for the team, it wasn't long before a problem was found. The Product Owner wasn't happy and had asked was going on down there in the galley, do we need new coders? Better ones? Hipster coders? The release shipped... After an investigation, some blushes, raised eyebrows and a couple of "Oh... Yeeeah's" they found the cause. A confusion had collided with a bodge, and the result was a mess. Should they write an automated - test for this problem?  An embarrassing mistake or a misstep can make us feel we have to do something . An action greater than a fix is needed. A penitance needs to be performed, to redeem ourselves, to make us right again. Sometimes the penitence is best spent adding a test for that issue. Especially if writing that test has a low cost, the frequency of the problem occurring is high or

Manumation, the worst best practice.

There is a pattern I see with many clients, often enough that I sought out a word to describe it: Manumation, A sort of well-meaning automation that usually requires frequent, extensive and expensive intervention to keep it 'working'. You have probably seen it, the build server that needs a prod and a restart 'when things get a bit busy'. Or a deployment tool that, 'gets confused' and a 'test suite' that just needs another run or three. The cause can be any number of the usual suspects - a corporate standard tool warped 5 ways to make it fit what your team needs. A one-off script 'that manager' decided was an investment and needed to be re-used... A well-intended attempt to 'automate all the things' that achieved the opposite. They result in a manually intensive - automated process, where your team is like a character in the movie Metropolis, fighting with levers all day, just to keep the lights on upstairs. Manual-automation, manu

The gamification of Software Testing

A while back, I sat in on a planning meeting. Many planning meetings slide awkwardly into a sort of ad-hoc technical analysis discussion, and this was no exception. With a little prompting, the team started to draw up what they wanted to build on a whiteboard. The picture spoke its thousand words, and I could feel that the team now understood what needed to be done. The right questions were being asked, and initial development guesstimates were approaching common sense levels. The discussion came around to testing, skipping over how they might test the feature, the team focused immediately on how long testing would take. When probed as to how the testing would be performed? How we might find out what the team did wrong? Confused faces stared back at me. During our ensuing chat, I realised that they had been using BDD scenarios [only] as a metric of what testing needs to be done and when they are ready to ship. (Now I knew why I was hired to help) There is nothing wrong with c

Pick a card...

Take a pack of cards, shuffle them well, and place them on the desk in front of you. Could you accurately tell me what the order of the cards would be, without looking?   By Rosapicci - Own work, CC BY-SA 4.0 Now spend 2 weeks in a software development team, writing code & using services, and deploy that code to your cloud server environments. Could you tell me where the bugs would be, before looking? In both cases we have a rough idea of what’s in the product at the end. But the detail, how its actually going to play out? we have no realistic idea. Skeptical? Take the playing cards… If we lay the cards out one card at a time. Then the order in which they are laid out, has probably never been seen before. Ever.The number of permutations of the well-known 52 playing card pack is 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000. OK, now let’s get back to our code. Even trivial apps, include dozens of code libr

Shutter Sync, when failure provides enlightenment

Shutter sync is an interesting artefact generated when we video moving objects. Take a look at this video of a Helicopter taking off: Notice how the boats are moving as normal, but the rotors appear to be barely moving at all. This isn’t a ‘Photoshop’. It’s an effect of video camera’s frame rate matching the speed/position of the rotors. Each time the camera takes a picture or ‘frame’ the rotors happen to be in approximately the same relative position. The regular and deterministic behaviour of both machines enables the helicopter to appear to be both broken and flying. The rotors don’t appear to be working, while other evidence suggests its rotors are providing all the lift required. What's so exciting is that this tells us something useful, as well as apparently being a flaw or fail. We could both assume the rotors move with a constant rotation, and estimate a series of possible values for the speed of the rotors, given this video. Your automated checks/tests can

Wrong in front of you.

In 2008 I attended GTAC in Seattle , a conference devoted to the use of automation in software testing. Since their first in London in 2006 , Google have been running about one a year, in various locations around the world. This post isn't really about the conference, its about a realisation that I had the day after the conference. After the conference I went sight-seeing in Seattle. I rode the short Simpsons -like Monorail and took a lift up the Jetsons -like space needle. I enjoyed my time there, and found the people very friendly. The conference had been very technology focused, many (but granted, not all) speakers focused on tools and how to use tools. While useful, the tools are only part of testing - and even then they typically just support testing rather than "do" testing. The Seattle Monorail. While I was at the top of the Space needle, I took out my phone and like a good tourist started taking pictures. I'd typically take a couple of pictures then