Skip to main content

Posts

Showing posts with the label illusion

Vimes Boots & why the right AI evals could save your project

"The reason the rich were so rich, Vimes reasoned, was because They managed to spend less money" - Sam Vimes , from Men at Arms by Terry Pratchett. The theory goes that richer people can afford a pair of boots that last longer before repair or replacement than poor people, who can only afford cheaper and less hardy boots requiring replacement in just a ⅓ of the time. It’s the old "buy cheap buy twice" idea and it still applies today, especially in the world of AI agents and LLMs. Currently your agent is 'managed' or 'directed' at least in part by a harness. In fact, the IDE coding agent is really a harness + an LLM. The LLM “writes the code” and requests tool use, the harness enables those tool calls. E.g.: Write a file or run a command line tool etc. In one sense the LLM does the productive side of the agents work while the harness has a more deterministic & feedback role. (e.g. reports back that the command the LLM suggested returned resu...

Shutter Sync, when failure provides enlightenment

Shutter sync is an interesting artefact generated when we video moving objects. Take a look at this video of a Helicopter taking off: Notice how the boats are moving as normal, but the rotors appear to be barely moving at all. This isn’t a ‘Photoshop’. It’s an effect of video camera’s frame rate matching the speed/position of the rotors. Each time the camera takes a picture or ‘frame’ the rotors happen to be in approximately the same relative position. The regular and deterministic behaviour of both machines enables the helicopter to appear to be both broken and flying. The rotors don’t appear to be working, while other evidence suggests its rotors are providing all the lift required. What's so exciting is that this tells us something useful, as well as apparently being a flaw or fail. We could both assume the rotors move with a constant rotation, and estimate a series of possible values for the speed of the rotors, given this video. Your automated checks/tests can ...

Controlling software development

Do you ever feel like we do all this work and maybe we needn't of bothered? Things might have worked-out without our intervention. Or we are actually worse off, now, after the work? You're not alone. This is a common problem in any role where you need to investigate the effects of changes. What you're feeling is a lack of control . A control is a view of the world, without your work. It's an alternate view of the world where everything is the same except for your fix/hack/intervention. They behave like 3D TV, they let your mind's-eye 'see' the effects, by making them standout from the background. They are commonly used in scientific and especially pharmaceutical research studies. They let the researchers know how effective a treatment was, compared with similar patients who received placebo (or  older established medicine) pills rather than the new treatment. The researchers can tell whether, for example, a new flu remedy actually helped the patients. O...