Skip to main content

Posts

Bucket of trouble, if you don't keep an eye on AI

 Ever tried to get a teenager to do more chores around the home? For those without this joy in their lives, I’ll let you in on a secret - it goes down like a bucket of sick. You can sometimes cajole them, sometimes bribe them and even threaten them (We’ll take away your laptop!) But at best, this has mixed results. You’ll often get an uptick in throughput - the rubbish & recycling will exit the apartment more often. But quality will suffer, the cacophony of bins being banged, incessant grumbling and milk cartons being scattered about will lead to you questioning many of your life choices. This is often the case in life generally and software development in particular. The old adage “Faster, better, cheaper - Pick any 2” still holds true. Interestingly this isn’t always bad news for those in the business. More often it's a problem for customers, be they other teams or actual customers .  For example, if you provide the means to produce faster - more people may be buying you...

Can 'reasoning' LLMs help with recs data creation?

  A nervous tourist, glances back and forth between their phone and the street sign. They then rotate their phone 180 degrees, pauses, blink and frown. The lost traveller, flags a nearby ‘local’ (the passer-by has a dog on a lead.   “Excuse me…” she squeaks, “How may I get to Tower Hill?” “Well, that’ s a good one” ponders the dog walker, “You know…” “Yes?” queries the tourist hopefully. “Yeah…” A long pause ensues then, “Well I wouldn’t start from here” He states confidently. The tourist almost visibly deflates and starts looking for an exit. That’s often how we start off in software testing. Despite the flood of methodologies, tips on pairing, power of three-ing, backlog grooming, automating, refining and all the other … ings ) We often find ourselves having to figure out and therefore ‘test’ a piece of software by us ing it. And that’s good. Its powerful, and effective if done right. But, like our dog walker, we can sometimes find ourselves somewhere unfamiliar...

Don't be a Vogon, make it easy to access your test data!

 The beginning of the hitch-hikers guide to the galaxy leads with an alien ship about to destroy the Earth, and the aliens saying we (mankind) should have been more prepared – as a notice had been on display quite clearly – on Alpha Centauri the nearby star system, for 50 years. Seriously, people - what are you moaning about – get with the program?  The book then continues with the theme of bureaucratic rigidity and shallow interpretations of limited data. E.g. The titular guide’s description of the entire Earth is one word: “Harmless”, but after extensive review the new edition will state: “Mostly harmless”. Arthur Dent argues with the Vogons about poor data access This rings true for many software testing work, especially those with externally developed software, be that external to the team or external to the company. The same approaches that teams use to develop their locally developed usually don’t work well. This leads to a large suite of shallow tests that are usually h...

Can Gen-AI understand Payments?

When it comes to rolling out updates to large complex banking systems, things can get messy quickly. Of course, the holy grail is to have each subsystem work well independently and to do some form of Pact or contract testing – reducing the complex and painful integration work. But nonetheless – at some point you are going to need to see if the dog and the pony can do their show together – and its generally better to do that in a way that doesn’t make millions of pounds of transactions fail – in a highly public manner, in production.  (This post is based on my recent lightning talk at  PyData London ) For the last few years, I’ve worked in the world of high value, real time and cross border payments, And one of the sticking points in bank [software] integration is message generation. A lot of time is spent dreaming up and creating those messages, then maintaining what you have just built. The world of payments runs on messages, these days they are often XML messages – and they ...

Text to SWIFT - making data from prose (What possible use could Gen AI be to me? - Part 2)

 As I write this, my dog is grumpily moving around the room pausing intermittently to give me disappointed looks - looks that only my elderly mother could compete with. She (my dog) is annoyed by the robot vacuum cleaner. Its not been run for a while in that room - and its making a noisy foray into dark corners in a valiant effort to cleanse the mess. Its grinding gears and the cloud of dust in its wake is not helping to ease the dogs nerves. The dog's pleading puppy dog eyes & emotions have of course been anthropomorphised - at least a bit - by me (My dog is 7 years old and weighs over 20kg - so has little to fear). That is - I've taken human feelings and mapped them onto my dog. I know she has emotions - but she lacks language - or at least a language that (1) we humans understand, (2) maps to the same phrases or concepts I'm using. But I'm human, That's how I think and how I interact with people and sometimes - machines. Deciphering the problem and representi...

What possible use could Gen AI be to me? (Part 1)

There’s a great scene in the Simpsons where the Monorail salesman comes to town and everyone (except Lisa of course) is quickly entranced by Monorail fever… He has an answer for every question and guess what? The Monorail will solve all the problems… somehow. The hype around Generative AI can seem a bit like that, and like Monorail-guy the sales-guy’s assure you Gen AI will solve all your problems - but can be pretty vague on the “how” part of the answer. So I’m going to provide a few short guides into how Generative (& other forms of AI) Artificial Intelligence can help you and your team. I’ll pitch the technical level differently for each one, and we’ll start with something fairly not technical: Custom Chatbots. ChatBots these days have evolved from the crude web sales tools of ten years ago, designed to hoover up leads for the sales team. They can now provide informative answers to questions based on documents or websites. If we take the most famous: Chat GPT 4. If we ignore the...

Is your ChatBot actually using your data?

 In 316 AD Emperor Constantine issued a new coin,  there's nothing too unique about that in itself. But this coin is significant due to its pagan/roman religious symbols. Why is this odd? Constantine had converted himself, and probably with little consultation -  his empire to Christianity, years before. Yet the coin shows the emperor and the (pagan) sun god Sol.  Looks Legit! While this seems out of place, to us (1700 years later), it's not entirely surprising. Constantine and his people had followed different, older gods for centuries. The people would have been raised and taught the old pagan stories, and when presented with a new narrative it's not surprising they borrowed from and felt comfortable with both. I've seen much the same behaviour with Large Language Models (LLMs) like ChatGPT. You can provide them with fresh new data, from your own documents, but what's to stop it from listening to its old training instead?  You could spend a lot of time collati...

AI Muggins

I play a card game called cribbage. I often play it with my son . One interesting part of the game is the muggins rule. This means that you can claim points from other players turns, if they miscount the score.  The scoring is slightly nerve racking, with each of us double and triple checking our scores, to avoid falling foul of ‘muggins’, that’s part of the fun.  But my son and I also find ourselves discussing other hands of cards, in a sort of alternate history version of the game. “So if I had a 7 instead of a 2 of hearts, then I’d get a double run and score at least 8 more points”.   “Yes Dad, if you had different cards then you would likely have a different score, but you don’t” he says while rolling his eyes.  This sort of bitter-sweet history rewriting is a convenient tool for us to swallow the awkward truth of the real world. We often create alternate things to object to.  Take Chat GPT 4 and tools like Copilot X. These are powerful tools, capable o...

I for one welcome our new AI helper.

 I was lucky enough to have started my career in a small company and then in a start-up. Both provided me with an environment perfect for learning. I sat with experts who took time out of their day to help answer my questions. From them, I learned the basics of what I still use today.  I’ve built on those foundations, but things would have been much harder if I didn’t have those foundational moments of my career. I’m not just talking about technical skills, the mentoring on how companies work, consulting and how to be better generally.  But those technical skills were also a big part of it – and a part many people miss out on in their careers. The rise of Large Language Models like ChatGPT4 is rapidly helping to fill that gap – where people don’t have a technical mentor who can explain and help work through those technical problems.  I’m no longer that junior team member – asking the dumb questions (OK, well usually I’m not) but even I find Chat GPT excellent at cons...

Micropython + LoRaWAN = PyLoRaWAN

I recently open sourced a simple Micropython library for LoRaWAN on the Raspberry Pi Pico.  (If you are interested, You can find it on GitHub .) If you are unsure what that all means, let me unpack it for you... Micropython is a slimmed down version of Python 3.x that works on microcontrollers like the Raspberry Pi Pico, and a host of other microcontroller boards .  LoRaWAN is a wireless communication standard that is ideal for long range, low power & low band width data transmission. Its based on a clever technique for making signals work well over distance, called LoRa. The library I've shared is a wrapper around the existing LoRaWAN support provided by the RAK Wireless 4200 board. The RAK4200  (affiliate link) essentially provides a modem, that can establish a connection to the network and relay messages. It uses the traditional AT command syntax (used by the modems of yore!) The Pico and RAK4200 Evaluation board (there is also a UPS under the Pico there - that's...

Development and test environments - on demand at the press of a button (That actually work!)

“Works on my machine!” “Fails most epicly on my test system!” “Oh, wait… it works on CI but fails in Test env 3.” Sound familiar? These sorts of conversations are thankfully a thing of the past.  Wait, hold on - are you still having these sorts of conversations? That's probably because you are working somewhere where the development, test, production & CI servers are being created by people, painfully, once. Alexander the Great cutting through the Gordian knot of a particularly gnarly micro-service deployment. You set up your laptop, you pray to the god of operating system patches and upgrades and hope that nothing ever changes (ever). You're gonna be the last person in the team to take that new Mac OS upgrade - let the rest of the team run through those mine fields first. And the test systems? Last time you asked for a new one of those your programme manager ended up on new & stronger heart meds. Luckily, there are tools that can help.  Gitpod , for example, allows yo...

Test Engineers, counsel for... all of the above!

Sometimes people discuss test engineers and QA as if they were a sort of police force, patrolling the streets of code looking for offences and offenders. While I can see the parallels, the investigation, checking the veracity of claims and a belief that we are making things safer. The simile soon falls down. But testers are not on the other side of the problem, we work alongside core developers, we often write code and follow all the same procedures (pull requests, planning, requirements analysis etc) they do. We also have the same goals, the delivery of working software that fulfills the team’s/company's goals and avoids harm. "A few good men" a great courtroom drama, all about finding the truth. Software quality, whatever that means for you and your company is helped by Test Engineers. Test Engineers approach the problem from another vantage point. We are the lawyers (& their investigators) in the court-room, sifting the evidence, questioning the facts and viewing t...

Podcast: VW Dieselgate and the $33bn b̶u̶g̶ feature

This is the story behind the VW emissions scandal, that so far has cost the company over $33bn.  We look into the technology issues VW faced and the investigations that uncovered the problem. The MP3 (Audio) file is available here .

Podcast: The Therac-25, buggy software that killed.

As part of an ongoing project to learn more about what we've got wrong to help us improve, I look at the Therac-25 incidents, a devastating collection of software failures that often rank in the top 10 of civilian radiation accidents. The Therac-25 radiation therapy device killed or injured 6 people across Canada and the United States. The Therac-25 was a room-sized machine, in this cut-away, you can see the computer terminal in the near-bottom left. I look into two of the most severe bugs. Why the manufacturer didn't fix them and what we can learn from their mistakes. The MP3 (Audio) file is available here .

Podcast: The Post Office Horizon Scandal

In this episode, we look at the Post Office Horizon scandal, an app that caused what some people are describing as the largest miscarriage of justice in British legal history. We look at some bugs, the legal judgements and what might have gone wrong at the Post Office to allow things to go so off track. I analyse what we can learn from the disaster to help stop this from happening in our own projects. The MP3 (Audio) file is available here.

Podcast: Voting Machine Fail

We wind the clock back to November 2019 and investigate the failure of voting machines in Northampton County, Pa., USA. We break down what went wrong, what caused the problem and what we can learn about the risks of software development from this high profile incident. The show notes and transcripts are  available for free .

Avoiding Wild Goose Chases While Debugging.

When I’m debugging a complex system, I’m constantly looking for patterns. I just ran this test code... What did I see in the log? I just processed a metric $^&*-load of data, did our memory footprint blip? I’m probably using every freedom-unit of screen space to tail logs, run a memory usage tool, run an IDE & debugger, watch a trace of API calls, run test code… And I’m doing this over and over. Then I see it. Bingo, that spike in API calls hits only when that process over there jumps to 20% processor usage when the app also throws that error. Unfortunately, I may have been mistaken. On a sufficiently complex system, the emergent behaviour can approach the appearance of randomness. Combinatorial explosions are for real, and they are happening constantly in your shiny new MacBook. My bug isn’t what I think it is. I’m examining so many variables in a system with dozens of subsystems at play, it's inevitable I will see a correlation. We know this more formally as...

Avoiding Death By Exposure

There's no such thing as a small bug. Customers, be they people or businesses, do not measure Software bugs in metres, feet or miles or kilograms. They use measures like time wasted, life-lost and money.  Take a recent bug from Facebook . It affected thousands, maybe millions of customers and the bottom line of companies (seemingly) unconnected with Facebook such as Spotify, Tik-Tok and SoundCloud, and probably countless smaller companies. So why did the journalist seem to think it was small? Too often we judge the systems we create by how likely they are to fail, given our narrow view of the world. A better measure is our exposure when the systems fail . The exposure for Facebook is a greater motivation for other companies to disentangle themselves from Facebook's SDK, or promote a rival platform. It doesn't matter if our bug is one tiny assumption or one character out of place, if it stops a million people from using or buying an app then it's a huge bug.  ...

Convexity in Predictive Value & Why Your Tests Are Flaky.

A long time ago, in a country far away, a cunning politician suggested a way to reduce crime. He stated that a simple test that could be used to catch all the criminals. When tested, all the criminals would fail the test and be locked up. There’d be no need for expensive courts, crooked lawyers or long drawn out trials. The politician failed to give details of the test when pressed by journalists, stating that the test was very sensitive and they wouldn’t understand it. His supporters soon had their way and the politician was elected to office. On his first day in office, he deployed his national program of criminality-testing. Inevitably the details of the test leaked out. The test was simple and was indeed capable of ensuring 100% of criminals were detected. The test was: If the person is alive, find them guilty and lock them up. The test had a sensitivity of 100%, every single actual... real... bonafide criminal would fail the test and find themselves in prison. Unfo...