epic-fail-godzilla
Epic Failing

This last week I missed solutions to two different types of problems, and how I reacted to each type was so radically different that I thought I would take a moment to reflect.

The first type involved new feature discussions in the office. Usually, when the team starts discussing a new feature, one of the first questions asked is how much needs to change in order to support that new feature. My preliminary thoughts on a certain feature was that it was going to involve a lot of infrastructure changes, which caused us to rethink adding the feature at all because of the complexity involved. However, on further discussion, James, proposed a solution involving changing just a few things here and there.

After reflecting on James's proposed solution, he was completely right. I had completely overthought the problem and we really could fully support the new feature with just a few minor tweaks to our existing codebase.

My initial response to this first type of miss was all out pride. This is why I work with smart people, because they see things I don't, and their input makes me a better programmer, working with them makes me a better CTO, and their hard work makes First Opinion a better company.

The second type of problem I missed involved failures in our backend systems. We had two significant downtime issues this week. The first happened early in the week, one of our long running cron jobs had been failing for at least a day due to an input change.

The second happened early Saturday morning (or late Friday night if you prefer) and in my sleepy state I failed to fix the problem on my first stab, and so it stayed unresolved until I was awoken a second time a few hours later.

Both of these issues were caused by monitoring failures in our systems, and my initial response to both of them was disappointment in myself for how bad I had screwed up because neither issue should have ever existed for more than a few minutes, at most.

Programmers screw up, code is brittle and it breaks, a lot. But, because epic level screw-ups are so common in programming, you have to be careful in how you handle discussing the error after the fact, I don't know anyone who enjoys their failures being pointed out again, and again, and again. Likewise, I've never known a programmer who didn't feel just awful after an epic screw up.

epic-fail-300
Epic Failing

My focus instead turns to solving the issue that caused the epic failure, and to do that, I like to use the 5 Whys to figure out what went wrong, and to make sure the problem is never repeated again in the future. I also drop every other project on my plate until the solution is fully implemented. My team and I are going to fail spectacularly1 again, it's just the nature of software development, but the goal is to never fail in the same way twice.

Supplementary Material

I liked these two quotes from David Sokol, quote one:

We are very tolerant of mistakes resulting from a judgmental error at the planning stage, when despite our team's best efforts, the market zigged and we zagged. It happens. We recognize the error, adjust from it and learn from it. We are not tolerant, however, of mistakes made from a lack of planning or diligence or from plain laziness. Tolerating such situations ultimately makes the organization very good at them.

And two:

Making a decision that delivers a less than desired outcome is part of business life. Failing to take the time to understand your mistakes and learn from them is totally unacceptable.

And this is a great pep talk from the Duke Women's Basketball Coach about how mistakes are part of the game:

During the game some things are going to go wrong it's what happens. If you really like watch the game of basketball, every possession something goes wrong for somebody.

That's how someone scores or that's how someone doesn't score because something goes wrong with every possession.

So there's 150 possessions in a game, 150 times someone's making a mistake. It's part of it so just don't don't let that throw you off. You make a mistake, it happens, but just get back into the play as quickly as you can and then try to make the next right play.

That's really the most important thing about being being a basketball player is when you make a mistake try to make the next right play as many times as you can.


  1. but just to be clear here, both these issues were caused by me, no team needed. Despite some big wins this week, overall, it was a tough week for my programming self-esteem.