Skip to main content

Posts

Test Engineers, counsel for... all of the above!

Sometimes people discuss test engineers and QA as if they were a sort of police force, patrolling the streets of code looking for offences and offenders. While I can see the parallels, the investigation, checking the veracity of claims and a belief that we are making things safer. The simile soon falls down. But testers are not on the other side of the problem, we work alongside core developers, we often write code and follow all the same procedures (pull requests, planning, requirements analysis etc) they do. We also have the same goals, the delivery of working software that fulfills the team’s/company's goals and avoids harm. "A few good men" a great courtroom drama, all about finding the truth. Software quality, whatever that means for you and your company is helped by Test Engineers. Test Engineers approach the problem from another vantage point. We are the lawyers (& their investigators) in the court-room, sifting the evidence, questioning the facts and viewing t
Recent posts

Podcast: VW Dieselgate and the $33bn b̶u̶g̶ feature

This is the story behind the VW emissions scandal, that so far has cost the company over $33bn.  ( The show notes and transcripts are available for free here. ) We look into the technology issues VW faced and the investigations that uncovered the problem. The show notes and transcripts are available for free .

Podcast: The Therac-25, buggy software that killed.

As part of an ongoing project to learn more about what we've got wrong to help us improve, I look at the Therac-25 incidents, a devastating collection of software failures that often rank in the top 10 of civilian radiation accidents. The Therac-25 radiation therapy device killed or injured 6 people across Canada and the United States. The Therac-25 was a room-sized machine, in this cut-away, you can see the computer terminal in the near-bottom left. I look into two of the most severe bugs. Why the manufacturer didn't fix them and what we can learn from their mistakes. The show notes and transcripts are available for free .

Podcast: The Post Office Horizon Scandal

In this episode, we look at the Post Office Horizon scandal, an app that caused what some people are describing as the largest miscarriage of justice in British legal history. We look at some bugs, the legal judgements and what might have gone wrong at the Post Office to allow things to go so off track. I analyse what we can learn from the disaster to help stop this from happening in our own projects. The show notes and transcripts are available for free .

Podcast: Voting Machine Fail

We wind the clock back to November 2019 and investigate the failure of voting machines in Northampton County, Pa., USA. We break down what went wrong, what caused the problem and what we can learn about the risks of software development from this high profile incident. The show notes and transcripts are  available for free .

Avoiding Wild Goose Chases While Debugging.

When I’m debugging a complex system, I’m constantly looking for patterns. I just ran this test code... What did I see in the log? I just processed a metric $^&*-load of data, did our memory footprint blip? I’m probably using every freedom-unit of screen space to tail logs, run a memory usage tool, run an IDE & debugger, watch a trace of API calls, run test code… And I’m doing this over and over. Then I see it. Bingo, that spike in API calls hits only when that process over there jumps to 20% processor usage when the app also throws that error. Unfortunately, I may have been mistaken. On a sufficiently complex system, the emergent behaviour can approach the appearance of randomness. Combinatorial explosions are for real, and they are happening constantly in your shiny new MacBook. My bug isn’t what I think it is. I’m examining so many variables in a system with dozens of subsystems at play, it's inevitable I will see a correlation. We know this more formally as

Avoiding Death By Exposure

There's no such thing as a small bug. Customers, be they people or businesses, do not measure Software bugs in metres, feet or miles or kilograms. They use measures like time wasted, life-lost and money.  Take a recent bug from Facebook . It affected thousands, maybe millions of customers and the bottom line of companies (seemingly) unconnected with Facebook such as Spotify, Tik-Tok and SoundCloud, and probably countless smaller companies. So why did the journalist seem to think it was small? Too often we judge the systems we create by how likely they are to fail, given our narrow view of the world. A better measure is our exposure when the systems fail . The exposure for Facebook is a greater motivation for other companies to disentangle themselves from Facebook's SDK, or promote a rival platform. It doesn't matter if our bug is one tiny assumption or one character out of place, if it stops a million people from using or buying an app then it's a huge bug.