The hidden dangers of Process Debt

Most of us involved in software development are familiar with the term “technical debt”.

As a quick reminder, it was introduced by Ward Cunningham to describe the phenomenon that occurs when we use code that is easy to implement in the short run instead of applying the best overall solution we have identified.

It is by definition a conscious decision to take a shortcut for short term gratification (like taking out a new credit card for a holiday) that in thew long term will cost us to spend extra time in development when improving the system (like paying interest on the credit card debt).

Until the capital is repaid in full we will always pay interest when adding features to the system as the debt creates obstacle to adding new code efficiently.

Ward suggests that when we are in a situation of rushing something out we need to be conscious that we will have to pay the capital in the future or the interests will cripple us.

Ward suggests refactoring as a solution to paying off technical debt.

I really like Ward’s vision and I want to expand the metaphor one level up.

Now if we change the word “software” in my description of technical debt above with “processe” we can define Process Debt.

Replacing we get:

Process Debt is the phenomenon that occurs when we use a processes that is easy to implement in the short run instead of applying the best overall solution we have identified.

Let’s look at one example

You notice that lately the system seems to be more unstable than usual. You know this because there are more calls from customer care and more defects get raised.

Option 1: You want to get to the bottom of the situation and believe that a root cause analysis with 5 whys could get you there. If you use this approach you will probably identify a change in your process to help you prevent some defects in the future.

Option 2: You implement a better policy for developers to select the defects to work on reducing the time the defects are in the unresolved queue while maintaining new features creation throughput relatively stable.

You know Option 1 is better in the long run because it will generate a change in the process to reduce the amount of rework. This will mean less time spent fixing and more time spent on new features that as a consequence means higher throughput and lower lead time for new features.

Option 1 requires some investment. You need to hold one or more root cause analysis sessions, identify the problem(s) experiment with solutions until you find a solution that mitigates your problem.

If you are under pressure, because the new product needs shipping and the old defects need fixing, you are likely to choose option 2.

By doing this you have introduced process debt

The capital of this debt is the lack of change in the process  you could have identified if using Option 1.

Oh, yes, I forgot, change is difficult.

The worst part of it is the interest on the debt you will pay forever until you pay the capital. This interest will appear in the form of.
1. bugs keep on coming, we seem to be fixing bugs all the time
2. new features get delayed because now the queue is in the new features to be delivered
3. the lead time of any new feature is expanded by a factor X
4. the customers continue on complaining because of your defects in production
5. your product owner starts being annoyed at the amount of work spent by the development team on defects
6. developers are tired of always fixing defects
7. …
n. need I say more?

This interest will be paid for the rest of your life if you don’t fix the problem.

Months after you are covered in defects, are stressed by your customers and change job.

Is this healthy? Certainly not.

Is there a cure? Oh yes, indeed.

Continuous improvement has the same effect on process that refactoring has on software. It repays the capital of your process debt.

With small changes in the form of experiments you will be able to clearly discuss the problem that the team is having and make small tweaks continuously.

My esteemed colleague and fervid innovator Claudio Perrone presents his continuous improvement model called PopcornFlow saying

If change is hard, make it continuous

I could not agree more.

By making process change a continuous activity, we enable behaviours that will stay with teams forever and we unleash the power of people’s creativity in improving their own way of working.

 

 

Advertisements

On empowerment and the 2 types of downtime

Downtime
Downtime

In a recent LinkedIn discussion, the topics of testing downtime and what should testers do during it was examined. I read and reread the answers a few times and I became convinced that I have a completely different view of what downtime is and how it should be utilised.

The first thing that I noticed was that test managers and leads had a substantial list of items to ask testers to do during their downtime but none of them seemed interested in asking the testers themselves what they thought they should do when in such situation.

This made me reflect on different levels of testers empowerment and respective managers levels of trust. Asking a tester to do something during downtime is not a bad thing “tout court”, I can see some situations where it could be helpful. On the other hand fostering an environment where we assume people are going to do nothing if not asked to do something can be a self fulfilling prophesy.

I believe that by fostering and rewarding a culture of awareness and collaboration while assuming people’s good intent, leaders can help and motivate people to do things because they want to, not because they are asked to do them.

The second observation that made me reflect is the fact that most of the activities proposed were activities that generally testers that work with me do as part of their normal day to day job and not as a once off during downtime. Among things of this type, LinkedIn users suggested:

analyse coverage and issues root cause, learn automation, cross skill with subject matter experts, review of tests, document procedures on wikis, read a work related book, blog, watch a video, self placed training, knowledge transfer, and other “improving” activities.

This point made me think that maybe my downtime is different from other people’s downtime. We do most of the activities suggested in the thread (apart from some wasteful ones) as part of our normal work, so let me tell you what my 2 types of downtime are.

Good downtime that we need to encourage and Bad downtime that we need to fight.

Good downtime: good downtime is planned downtime that we introduce into our process by limiting resource utilization for example by setting WIP limits and allowing for slack. This downtime is designed so that people don’t burn out. During this time people can do whatever they want from searching funny memes on the web, to going for a walk in the park, or if they prefer, studying something they want to learn, there is no limit to what people can do during good downtime. In my experience this downtime generally is of the form of 1 to 2-3 hours with cadence that depends on the specific context and can be fine tuned depending on project and people needs.

Bad downtime
Bad downtime

Bad downtime: This is unexpected downtime due to us waiting for something to do because the flow of work is blocked somewhere else in the system. Say for example the build is broken and can’t be progressed for us to do exploratory testing or the business analyst is on holidays and there is shortage of new user stories coming through. This is bad downtime because it is affecting the flow of value towards our customer. In this case t-shaped testers can help a lot to fix the issue, in fact they are able to support the developer stuck with the broken build or help in the creation of new user stories. When issues like the above happen, using tools like Kanban can be extremely helpful, in fact you will be able to visualize the issue immediately in the form of a bottleneck. The next thing to do is for the team (including but not limited to the people in downtime) to swarm around the bottleneck, reduce it and restart the flow.

If you want to continuously improve your flow, swarming and resolving bottlenecks is necessary but not sufficient. It is important that you resolve the root cause of the downtime and related bottleneck. One effective way to expose bad downtime so that we can identify patterns and fine tune our process is the waste snake, extremely easy to set up and use, I’d strongly recommend it.

 

 

The 5 Stages of Expertise

Niels Bohr
Niels Bohr

It was more than 20 years ago when, in college, for the first time, I heard this

An expert is a man who has made all the mistakes which can be made, in a narrow field.
                                                                                          Niels Bohr (Physicist 1885 – 1962)

Initially I didn’t understand it, but it fascinated me, with time I learned to appreciate it.

Let me demonstrate it for you.

I smoke cigarettes that i roll up by myself and in the years I screwed up in likely every possible way while rolling one, and learned a lot about how to make a cigarette even in the most adverse weather conditions like gale force winds and pouring rain. I never rolled during an earthquake but if I can do it easily while driving a non automatic car in the city traffic, I can infer I can do it if the earth shakes a little so I can exclude this edge case. I can proudly claim to be an expert in the narrow field of “Rolling up cigarettes”

How about domains more complex than rolling up cigarettes?

When we talk about a complex domain, like software testing for example, what is an expert? If we want to go by Bohr’s definition we could assert that an expert in software testing doesn’t exist because it is physically impossible for any human being to make all the possible mistakes in such complex domain in one single life. I tend to think that Bohr’s quote is still valid for complex domains and real experts in such domains don’t exist.

These are what I imagine the 5 stages of expertise to be

The 5 stages of expertise
The 5 stages of expertise

As in stage 4, learning is an infinite activity, obviously there will be a wide range of expertise within the same stage and the closer you get to infinity the more you become an expert. Is infinity relative to time? Amount of books read? Number of experiments run? Maybe all of it.

This stupid model has been imagined by me, a person that believes that, in relation to software testing, he is is in stage 4 “Perpetual Learner”.

The reason why you don’t see the 5th stage is thatI haven’t reached it, in fact I don’t even know if such stage exists or not.

On the other hand, if the 5th stage existed, it would invalidate my own very model, in fact the model states that “Learning is an infinite activity” that contradicts the existence of a 5th stage. So should I exclude the existence of a 5th stage, so that my model works? Not really, I’d rather search for the reason that invalidates my model than defend it. Why? Because I learn more when I make mistakes than when I am right, how about you?

Continuous improvement is an infinite loop

This morning, at home with the flu, I thought about visualizing a continuous improvement process (that’s what crazy people do when they are sick). So I took markers and paper and I started to draw. Looking at it I realised that I was drawing an infinite loop. With my developer hat on I thought: WTF? There is a problem. I need to fix it.

Then I looked again, and it hit me. No problem at all, continuous improvement IS an infinite loop, and if you think you are done improving your process you are doomed!

You might say that the word continuous should have tipped me, yes you are right, but I find that looking at a graphic representation of a concept always makes me discover new things, that’s why I draw things even though any 5 years old could do a better job.

Markers + A4 paper + MsPaint + phone cam + Gliffy created this thing

Continuous Improvement