investigating software

Posts

Development and test environments - on demand at the press of a button (That actually work!)

“Works on my machine!” “Fails most epicly on my test system!” “Oh, wait… it works on CI but fails in Test env 3.” Sound familiar? These sorts of conversations are thankfully a thing of the past. Wait, hold on - are you still having these sorts of conversations? That's probably because you are working somewhere where the development, test, production & CI servers are being created by people, painfully, once. Alexander the Great cutting through the Gordian knot of a particularly gnarly micro-service deployment. You set up your laptop, you pray to the god of operating system patches and upgrades and hope that nothing ever changes (ever). You're gonna be the last person in the team to take that new Mac OS upgrade - let the rest of the team run through those mine fields first. And the test systems? Last time you asked for a new one of those your programme manager ended up on new & stronger heart meds. Luckily, there are tools that can help. Gitpod , for example, allows yo...

Test Engineers, counsel for... all of the above!

Sometimes people discuss test engineers and QA as if they were a sort of police force, patrolling the streets of code looking for offences and offenders. While I can see the parallels, the investigation, checking the veracity of claims and a belief that we are making things safer. The simile soon falls down. But testers are not on the other side of the problem, we work alongside core developers, we often write code and follow all the same procedures (pull requests, planning, requirements analysis etc) they do. We also have the same goals, the delivery of working software that fulfills the team’s/company's goals and avoids harm. "A few good men" a great courtroom drama, all about finding the truth. Software quality, whatever that means for you and your company is helped by Test Engineers. Test Engineers approach the problem from another vantage point. We are the lawyers (& their investigators) in the court-room, sifting the evidence, questioning the facts and viewing t...

Podcast: VW Dieselgate and the $33bn b̶u̶g̶ feature

This is the story behind the VW emissions scandal, that so far has cost the company over $33bn. We look into the technology issues VW faced and the investigations that uncovered the problem. The MP3 (Audio) file is available here .

Podcast: The Therac-25, buggy software that killed.

As part of an ongoing project to learn more about what we've got wrong to help us improve, I look at the Therac-25 incidents, a devastating collection of software failures that often rank in the top 10 of civilian radiation accidents. The Therac-25 radiation therapy device killed or injured 6 people across Canada and the United States. The Therac-25 was a room-sized machine, in this cut-away, you can see the computer terminal in the near-bottom left. I look into two of the most severe bugs. Why the manufacturer didn't fix them and what we can learn from their mistakes. The MP3 (Audio) file is available here .

Podcast: The Post Office Horizon Scandal

In this episode, we look at the Post Office Horizon scandal, an app that caused what some people are describing as the largest miscarriage of justice in British legal history. We look at some bugs, the legal judgements and what might have gone wrong at the Post Office to allow things to go so off track. I analyse what we can learn from the disaster to help stop this from happening in our own projects. The MP3 (Audio) file is available here.

Podcast: Voting Machine Fail

We wind the clock back to November 2019 and investigate the failure of voting machines in Northampton County, Pa., USA. We break down what went wrong, what caused the problem and what we can learn about the risks of software development from this high profile incident. The show notes and transcripts are available for free .

Avoiding Wild Goose Chases While Debugging.

When I’m debugging a complex system, I’m constantly looking for patterns. I just ran this test code... What did I see in the log? I just processed a metric $^&*-load of data, did our memory footprint blip? I’m probably using every freedom-unit of screen space to tail logs, run a memory usage tool, run an IDE & debugger, watch a trace of API calls, run test code… And I’m doing this over and over. Then I see it. Bingo, that spike in API calls hits only when that process over there jumps to 20% processor usage when the app also throws that error. Unfortunately, I may have been mistaken. On a sufficiently complex system, the emergent behaviour can approach the appearance of randomness. Combinatorial explosions are for real, and they are happening constantly in your shiny new MacBook. My bug isn’t what I think it is. I’m examining so many variables in a system with dozens of subsystems at play, it's inevitable I will see a correlation. We know this more formally as...